reprospect.test.sass.matchers.add_int128 module
- class reprospect.test.sass.matchers.add_int128.AddInt128Matcher(start: str | None = None)View on GitHub
Bases:
SequenceMatcherMatch SASS instructions originating from the addition of 2
__int128.It may use one of several instruction patterns:
Note
For a simple “load, add, store” sequence, the PTX may look as follows:
ld.global.nc.v2.u64 {%rd7, %rd8}, [%rd6]; add.s64 %rd9, %rd3, %rd5; ld.global.v2.u64 {%rd10, %rd11}, [%rd9]; add.cc.s64 %rd12, %rd10, %rd7; addc.cc.s64 %rd13, %rd11, %rd8; st.global.v2.u64 [%rd9], {%rd12, %rd13};
According to https://docs.nvidia.com/cuda/parallel-thread-execution/#extended-precision-arithmetic-instructions-add-cc, the carry-out value is written in the condition code register, usually
P0.Note
It is not decorated with
dataclasses.dataclass()because of https://github.com/mypyc/mypyc/issues/1061.- __init__(start: str | None = None) NoneView on GitHub
- match(instructions: Sequence[Instruction | str]) list[InstructionMatch] | NoneView on GitHub
- property next_index: intView on GitHub
- pattern_3IADD(instructions: Sequence[Instruction | str]) list[InstructionMatch] | NoneView on GitHub
Typically:
IADD.64 RZ, P0, R4, UR12 IADD.64 R4, R4, UR12 IADD.64.X R6, R6, UR14, P0
- pattern_4IADD3(instructions: Sequence[Instruction | str]) list[InstructionMatch] | NoneView on GitHub
Typically:
IADD3 R20, P0, R8, c[0x0][0x180], RZ IADD3.X R21, P0, R9, c[0x0][0x184], RZ, P0, !PT IADD3.X R22, P0, R10, c[0x0][0x188], RZ, P0, !PT IADD3.X R23, R11, c[0x0][0x18c], RZ, P0, !PT
or:
IADD3 R4, P0, PT, R4, R12, RZ IADD3.X R5, P0, PT, R5, R13, RZ, P0, !PT IADD3.X R6, P0, PT, R6, R14, RZ, P0, !PT IADD3.X R7, PT, PT, R7, R15, RZ, P0, !PT
That is, there may be one additional argument.