reprospect.test.sass.matchers.add_int128 module

class reprospect.test.sass.matchers.add_int128.AddInt128Matcher(start: str | None = None)View on GitHub

Bases: SequenceMatcher

Match SASS instructions originating from the addition of 2 __int128.

It may use one of several instruction patterns:

Note

For a simple “load, add, store” sequence, the PTX may look as follows:

ld.global.nc.v2.u64 {%rd7, %rd8}, [%rd6];
add.s64 %rd9, %rd3, %rd5;
ld.global.v2.u64 {%rd10, %rd11}, [%rd9];
add.cc.s64 %rd12, %rd10, %rd7;
addc.cc.s64 %rd13, %rd11, %rd8;
st.global.v2.u64 [%rd9], {%rd12, %rd13};

According to https://docs.nvidia.com/cuda/parallel-thread-execution/#extended-precision-arithmetic-instructions-add-cc, the carry-out value is written in the condition code register, usually P0.

Note

It is not decorated with dataclasses.dataclass() because of https://github.com/mypyc/mypyc/issues/1061.

__init__(start: str | None = None) NoneView on GitHub
match(instructions: Sequence[Instruction | str]) list[InstructionMatch] | NoneView on GitHub
property next_index: intView on GitHub
pattern_3IADD(instructions: Sequence[Instruction | str]) list[InstructionMatch] | NoneView on GitHub

Typically:

IADD.64 RZ, P0, R4, UR12
IADD.64 R4, R4, UR12
IADD.64.X R6, R6, UR14, P0
pattern_4IADD3(instructions: Sequence[Instruction | str]) list[InstructionMatch] | NoneView on GitHub

Typically:

IADD3 R20, P0, R8, c[0x0][0x180], RZ
IADD3.X R21, P0, R9, c[0x0][0x184], RZ, P0, !PT
IADD3.X R22, P0, R10, c[0x0][0x188], RZ, P0, !PT
IADD3.X R23, R11, c[0x0][0x18c], RZ, P0, !PT

or:

IADD3 R4, P0, PT, R4, R12, RZ
IADD3.X R5, P0, PT, R5, R13, RZ, P0, !PT
IADD3.X R6, P0, PT, R6, R14, RZ, P0, !PT
IADD3.X R7, PT, PT, R7, R15, RZ, P0, !PT

That is, there may be one additional argument.

start: Final[str]