reprospect.test.sass.matchers.add_int128 module

class reprospect.test.sass.matchers.add_int128.AddInt128Matcher(start: str | None = None)View on GitHub 

Bases: SequenceMatcher

Match SASS instructions originating from the addition of 2 __int128.

It may use one of several instruction patterns:

pattern_3IADD()
pattern_4IADD3()

Note

For a simple “load, add, store” sequence, the PTX may look as follows:

ld.global.nc.v2.u64 {%rd7, %rd8}, [%rd6];
add.s64 %rd9, %rd3, %rd5;
ld.global.v2.u64 {%rd10, %rd11}, [%rd9];
add.cc.s64 %rd12, %rd10, %rd7;
addc.cc.s64 %rd13, %rd11, %rd8;
st.global.v2.u64 [%rd9], {%rd12, %rd13};

According to https://docs.nvidia.com/cuda/parallel-thread-execution/#extended-precision-arithmetic-instructions-add-cc, the carry-out value is written in the condition code register, usually P0.

Note

It is not decorated with dataclasses.dataclass() because of https://github.com/mypyc/mypyc/issues/1061.

__init__(start: str | None = None) → NoneView on GitHub 

match(instructions: Sequence[Instruction | str]) → list[InstructionMatch] | NoneView on GitHub 

property next_index: intView on GitHub 

pattern_3IADD(instructions: Sequence[Instruction | str]) → list[InstructionMatch] | NoneView on GitHub 

Typically:

IADD.64 RZ, P0, R4, UR12
IADD.64 R4, R4, UR12
IADD.64.X R6, R6, UR14, P0

pattern_4IADD3(instructions: Sequence[Instruction | str]) → list[InstructionMatch] | NoneView on GitHub 

Typically:

IADD3 R20, P0, R8, c[0x0][0x180], RZ
IADD3.X R21, P0, R9, c[0x0][0x184], RZ, P0, !PT
IADD3.X R22, P0, R10, c[0x0][0x188], RZ, P0, !PT
IADD3.X R23, R11, c[0x0][0x18c], RZ, P0, !PT

or:

IADD3 R4, P0, PT, R4, R12, RZ
IADD3.X R5, P0, PT, R5, R13, RZ, P0, !PT
IADD3.X R6, P0, PT, R6, R14, RZ, P0, !PT
IADD3.X R7, PT, PT, R7, R15, RZ, P0, !PT

That is, there may be one additional argument.

start: Final[str]