SASS
- class tests.test.sass.test_any.TestAnyMatcherView on GitHub
Bases:
objectTests for
reprospect.test.sass.instruction.AnyMatcher.- INSTRUCTIONS: Final[dict[str, InstructionMatch]] = {'@!P3 LDG.E.64.CONSTANT R6, desc[UR16][R26.64]': InstructionMatch(opcode='LDG', modifiers=('E', '64', 'CONSTANT'), operands=('R6', 'desc[UR16][R26.64]'), predicate='@!P3', additional=None), '@!PT LDS RZ, [RZ]': InstructionMatch(opcode='LDS', modifiers=(), operands=('RZ', '[RZ]'), predicate='@!PT', additional=None), '@!UP0 UIMAD UR7, UR10, 0xc, URZ': InstructionMatch(opcode='UIMAD', modifiers=(), operands=('UR7', 'UR10', '0xc', 'URZ'), predicate='@!UP0', additional=None), '@UP0 LDCU.64 UR12, c[0x0][0x388]': InstructionMatch(opcode='LDCU', modifiers=('64',), operands=('UR12', 'c[0x0][0x388]'), predicate='@UP0', additional=None), 'ATOMG.E.ADD.STRONG.GPU PT, R4, desc[UR16][R4.64], R7': InstructionMatch(opcode='ATOMG', modifiers=('E', 'ADD', 'STRONG', 'GPU'), operands=('PT', 'R4', 'desc[UR16][R4.64]', 'R7'), predicate=None, additional=None), 'BAR.SYNC.DEFER_BLOCKING 0x0': InstructionMatch(opcode='BAR', modifiers=('SYNC', 'DEFER_BLOCKING'), operands=('0x0',), predicate=None, additional=None), 'BRA 0x240': InstructionMatch(opcode='BRA', modifiers=(), operands=('0x240',), predicate=None, additional=None), 'DADD R12, |R12|, |R14|': InstructionMatch(opcode='DADD', modifiers=(), operands=('R12', '|R12|', '|R14|'), predicate=None, additional=None), 'EXIT': InstructionMatch(opcode='EXIT', modifiers=(), operands=(), predicate=None, additional=None), 'F2FP.BF16.PACK_AB R27, R6, R27': InstructionMatch(opcode='F2FP', modifiers=('BF16', 'PACK_AB'), operands=('R27', 'R6', 'R27'), predicate=None, additional=None), 'FADD.FTZ.RN R0, R1, R2': InstructionMatch(opcode='FADD', modifiers=('FTZ', 'RN'), operands=('R0', 'R1', 'R2'), predicate=None, additional=None), 'FMUL R6, R27.reuse, 0.044714998453855514526': InstructionMatch(opcode='FMUL', modifiers=(), operands=('R6', 'R27.reuse', '0.044714998453855514526'), predicate=None, additional=None), 'FMUL.FTZ R2, R2, R3': InstructionMatch(opcode='FMUL', modifiers=('FTZ',), operands=('R2', 'R2', 'R3'), predicate=None, additional=None), 'HFMA2 R7, -RZ, RZ, 0, 5.9604644775390625e-08': InstructionMatch(opcode='HFMA2', modifiers=(), operands=('R7', '-RZ', 'RZ', '0', '5.9604644775390625e-08'), predicate=None, additional=None), 'HMUL2 R0, R2.H0_H0, R3.H0_H0': InstructionMatch(opcode='HMUL2', modifiers=(), operands=('R0', 'R2.H0_H0', 'R3.H0_H0'), predicate=None, additional=None), 'IADD3 R2, R2, 0x4, RZ': InstructionMatch(opcode='IADD3', modifiers=(), operands=('R2', 'R2', '0x4', 'RZ'), predicate=None, additional=None), 'IMAD.MOV.U32 R4, R4, R5, R6': InstructionMatch(opcode='IMAD', modifiers=('MOV', 'U32'), operands=('R4', 'R4', 'R5', 'R6'), predicate=None, additional=None), 'ISETP.NE.AND P0, PT, R1, RZ, PT': InstructionMatch(opcode='ISETP', modifiers=('NE', 'AND'), operands=('P0', 'PT', 'R1', 'RZ', 'PT'), predicate=None, additional=None), 'LDG.E R0, desc[UR12][R18.64]': InstructionMatch(opcode='LDG', modifiers=('E',), operands=('R0', 'desc[UR12][R18.64]'), predicate=None, additional=None), 'LDG.E.SYS R4, [R2]': InstructionMatch(opcode='LDG', modifiers=('E', 'SYS'), operands=('R4', '[R2]'), predicate=None, additional=None), 'LDGSTS.E.LTC128B [R211+-0x4000], desc[UR24][R182.64+0x20]': InstructionMatch(opcode='LDGSTS', modifiers=('E', 'LTC128B'), operands=('[R211+-0x4000]', 'desc[UR24][R182.64+0x20]'), predicate=None, additional=None), 'LDS.64 R12, [UR7+0x8]': InstructionMatch(opcode='LDS', modifiers=('64',), operands=('R12', '[UR7+0x8]'), predicate=None, additional=None), 'LEA.HI.X R145, R140, R134.reuse, R135, 0x2, P3': InstructionMatch(opcode='LEA', modifiers=('HI', 'X'), operands=('R145', 'R140', 'R134.reuse', 'R135', '0x2', 'P3'), predicate=None, additional=None), 'LOP3.LUT P0, RZ, R8, R94, RZ, 0xfc, !PT': InstructionMatch(opcode='LOP3', modifiers=('LUT',), operands=('P0', 'RZ', 'R8', 'R94', 'RZ', '0xfc', '!PT'), predicate=None, additional=None), 'MEMBAR.SC.GPU': InstructionMatch(opcode='MEMBAR', modifiers=('SC', 'GPU'), operands=(), predicate=None, additional=None), 'MOV R8, c[0x0][0x140]': InstructionMatch(opcode='MOV', modifiers=(), operands=('R8', 'c[0x0][0x140]'), predicate=None, additional=None), 'NOP': InstructionMatch(opcode='NOP', modifiers=(), operands=(), predicate=None, additional=None), 'RET.REL.NODEC R4 0x0': InstructionMatch(opcode='RET', modifiers=('REL', 'NODEC'), operands=('R4', '0x0'), predicate=None, additional=None), 'S2R R0, SR_CTAID.X': InstructionMatch(opcode='S2R', modifiers=(), operands=('R0', 'SR_CTAID.X'), predicate=None, additional=None), 'STG.E [R2], R4': InstructionMatch(opcode='STG', modifiers=('E',), operands=('[R2]', 'R4'), predicate=None, additional=None), 'STG.E.64.SYS [R2.64+UR4], R4': InstructionMatch(opcode='STG', modifiers=('E', '64', 'SYS'), operands=('[R2.64+UR4]', 'R4'), predicate=None, additional=None), 'STL [R1], R2': InstructionMatch(opcode='STL', modifiers=(), operands=('[R1]', 'R2'), predicate=None, additional=None), 'STS.64 [R9.X8], R2': InstructionMatch(opcode='STS', modifiers=('64',), operands=('[R9.X8]', 'R2'), predicate=None, additional=None), 'UIADD3 UR5, UPT, UPT, -UR4, UR9, URZ': InstructionMatch(opcode='UIADD3', modifiers=(), operands=('UR5', 'UPT', 'UPT', '-UR4', 'UR9', 'URZ'), predicate=None, additional=None)}
Zoo of real SASS instructions.
- MATCHER: Final[AnyMatcher] = AnyMatcher(pattern=regex.Regex('(?:(?P<predicate>@!?U?P(?:T|[0-9]+)))?\\s*(?P<opcode>[A-Z0-9]+)(?:\\.(?P<modifiers>[A-Z0-9_]+))*\\s*(?:(?P<operands>[\\w!\\.\\[\\]\\+\\-\\|~]+)(?:,?\\s*(?P<operands>[\\w!\\.\\[\\]\\+\\-\\|~]+))*)?', flags=regex.V0))
- test(instruction: str, expected: InstructionMatch) NoneView on GitHub
- test_no_match() NoneView on GitHub
- test_pattern() NoneView on GitHub
- class tests.test.sass.test_atomic.TestAtomicMatcherView on GitHub
Bases:
objectTests for
reprospect.test.sass.instruction.atomic.AtomicMatcher.- CODE_ADD_BASED_ON_CAS = '__global__ void cas({type}* __restrict__ const dst, const {type}* __restrict__ const src)\n{{\n static_assert(sizeof({type}) == {size});\n\n const auto index = blockIdx.x * blockDim.x + threadIdx.x;\n\n {integer}* const dest = reinterpret_cast<{integer}*>(dst + index);\n\n {integer} old = *dest;\n {integer} assumed;\n\n do {{\n assumed = old;\n {type} new_val = assumed + src[index];\n old = atomicCAS(\n dest,\n assumed,\n reinterpret_cast<{integer}&>(new_val)\n );\n }} while (old != assumed);\n}}\n'
- CODE_ADD_BASED_ON_CAS_128 = 'struct alignas(2 * sizeof(double)) My128Struct\n{\n double x, y;\n\n __host__ __device__ friend My128Struct operator+(const My128Struct& a, const My128Struct& b)\n {\n return My128Struct{\n .x = a.x + b.x,\n .y = a.y + b.y\n };\n }\n\n auto operator<=>(const My128Struct&) const = default;\n};\n\n__global__ void cas(My128Struct* __restrict__ const dst, const My128Struct* __restrict__ const src)\n{\n static_assert(sizeof (My128Struct) == 16);\n static_assert(alignof(My128Struct) == 16);\n\n static_assert(std::is_trivially_copyable_v<My128Struct>);\n\n const auto index = blockIdx.x * blockDim.x + threadIdx.x;\n\n auto* const dest = dst + index;\n\n My128Struct old = *dest;\n My128Struct assumed;\n\n do {\n assumed = old;\n My128Struct new_val = assumed + src[index];\n old = atomicCAS(\n dest,\n assumed,\n new_val\n );\n } while (old != assumed);\n}\n'
- CODE_ADD_RELAXED_BLOCK = '#include "cuda/atomic"\n\n__global__ void add({type}* __restrict__ const dst, const {type}* __restrict__ const src)\n{{\n const auto index = blockIdx.x * blockDim.x + threadIdx.x;\n\n cuda::atomic_ref<{type}, cuda::thread_scope_block> ref(dst[index]);\n ref.fetch_add(src[index], cuda::memory_order_relaxed);\n}}\n'
- CODE_EXCH = '__global__ void exch({type}* __restrict__ const dst, const {type}* __restrict__ const src)\n{{\n const auto index = blockIdx.x * blockDim.x + threadIdx.x;\n atomicExch(&dst[index], src[index]);\n}}\n'
- CODE_EXCH_DEVICE_PTR: Final[str] = '__device__ __constant__ int32_t* ptr;\n\n__global__ void atomic_exch_kernel() {\n atomicExch(ptr, 0);\n}\n'
- CODE_MIN = '#include "cuda/atomic"\n\n__global__ void add({type}* __restrict__ const dst, const {type}* __restrict__ const src)\n{{\n const auto index = blockIdx.x * blockDim.x + threadIdx.x;\n cuda::atomic_ref<{type}, cuda::thread_scope_device> ref(dst[index]);\n ref.fetch_min(src[index], cuda::memory_order_relaxed);\n}}\n'
- static match_one(*, decoder, **kwargs) tuple[AtomicMatcher, Instruction, InstructionMatch]View on GitHub
Match exactly one instruction.
- pytestmark = [Mark(name='parametrize', args=('parameters', (Parameters(arch=NVIDIAArch(family=<NVIDIAFamily.VOLTA: 'VOLTA'>, compute_capability=ComputeCapability(major=7, minor=0))), Parameters(arch=NVIDIAArch(family=<NVIDIAFamily.TURING: 'TURING'>, compute_capability=ComputeCapability(major=7, minor=5))), Parameters(arch=NVIDIAArch(family=<NVIDIAFamily.AMPERE: 'AMPERE'>, compute_capability=ComputeCapability(major=8, minor=0))), Parameters(arch=NVIDIAArch(family=<NVIDIAFamily.AMPERE: 'AMPERE'>, compute_capability=ComputeCapability(major=8, minor=6))), Parameters(arch=NVIDIAArch(family=<NVIDIAFamily.ADA: 'ADA'>, compute_capability=ComputeCapability(major=8, minor=9))), Parameters(arch=NVIDIAArch(family=<NVIDIAFamily.HOPPER: 'HOPPER'>, compute_capability=ComputeCapability(major=9, minor=0))), Parameters(arch=NVIDIAArch(family=<NVIDIAFamily.BLACKWELL: 'BLACKWELL'>, compute_capability=ComputeCapability(major=10, minor=0))), Parameters(arch=NVIDIAArch(family=<NVIDIAFamily.BLACKWELL: 'BLACKWELL'>, compute_capability=ComputeCapability(major=12, minor=0))))), kwargs={'ids': <class 'str'>})]
- test_add_relaxed_block_double(request, workdir, parameters: Parameters, cmake_file_api: FileAPI)View on GitHub
Similar to
test_add_relaxed_block_int()for double.
- test_add_relaxed_block_float(request, workdir, parameters: Parameters, cmake_file_api: FileAPI)View on GitHub
Similar to
test_add_relaxed_block_int()for float.
- test_add_relaxed_block_int(request, workdir, parameters: Parameters, cmake_file_api: FileAPI)View on GitHub
As of CUDA 13.0.0, the generated code still applies the
.STRONGmodifier, regardless of the.relaxedqualifier shown in the PTX.
- test_add_relaxed_block_unsigned_long_long_int(request, workdir, parameters: Parameters, cmake_file_api: FileAPI)View on GitHub
Similar to
test_add_relaxed_block_int()for unsigned long long int.
- test_atomicCAS(request, workdir, word, parameters: Parameters, cmake_file_api: FileAPI)View on GitHub
Test with
CODE_ADD_BASED_ON_CAS.
- test_atomicCAS_128(request, workdir, parameters: Parameters, cmake_file_api: FileAPI)View on GitHub
Supported from compute capability 9.x.
- test_exch_device_ptr(request, workdir: Path, parameters: Parameters, cmake_file_api: FileAPI) NoneView on GitHub
This test demonstrates that while
nvccemits anATOMGinstruction for an atomic exchange using a device pointer marked with__constant__,clang(as of 21.1.5) is not able to infer that the referenced memory resides in global memory and therefore falls back emitting a genericATOMinstruction.nvccappears to generate better code in this case: because the device pointer is declared as__constant__, the compiler can reasonably assume that it cannot point to local or shared memory, and thus must refer to global memory. This allowsnvccto use the more specific global-memory atomic instruction.
- test_exch_strong_device_float(request, workdir, parameters: Parameters, cmake_file_api: FileAPI)View on GitHub
Test with
CODE_EXCHfor float.
- test_exch_strong_device_int(request, workdir, parameters: Parameters, cmake_file_api: FileAPI)View on GitHub
Test with
CODE_EXCHfor int.
- test_exch_strong_device_unsigned_long_long_int(request, workdir, parameters: Parameters, cmake_file_api: FileAPI)View on GitHub
Test with
CODE_EXCHfor unsigned long long int.
- test_min_relaxed_device_int(request, workdir, parameters: Parameters, cmake_file_api: FileAPI)View on GitHub
Test with
CODE_MINfor int.
- test_min_relaxed_device_long_long_int(request, workdir, parameters: Parameters, cmake_file_api: FileAPI)View on GitHub
Test with
CODE_MINfor long long int.
- test_min_relaxed_device_unsigned_long_long_int(request, workdir, parameters: Parameters, cmake_file_api: FileAPI)View on GitHub
Test with
CODE_MINfor unsigned long long int.
- class tests.test.sass.test_branch.TestBranchMatcherView on GitHub
Bases:
objectTests for
reprospect.test.sass.instruction.branch.BranchMatcher.- INSTRUCTIONS: Final[dict[str, InstructionMatch]] = {'@!UP0 BRA 0xc': InstructionMatch(opcode='BRA', modifiers=(), operands=('0xc',), predicate='@!UP0', additional=None), '@PT BRA 0xf0': InstructionMatch(opcode='BRA', modifiers=(), operands=('0xf0',), predicate='@PT', additional=None), '@UPT BRA 0x170': InstructionMatch(opcode='BRA', modifiers=(), operands=('0x170',), predicate='@UPT', additional=None), 'BRA 0x240': InstructionMatch(opcode='BRA', modifiers=(), operands=('0x240',), predicate=None, additional=None)}
Zoo of real
BRAinstructions.
- MATCHER: Final[BranchMatcher] = BranchMatcher(pattern=regex.Regex('(?:(?P<predicate>@!?U?P(?:T|[0-9]+)))?\\s*(?P<opcode>BRA)\\s*(?P<operands>0x[0-9A-Fa-f]+)', flags=regex.V0))
- test(instruction: str, expected: InstructionMatch) NoneView on GitHub
- test_no_match() NoneView on GitHub
- test_pattern() NoneView on GitHub
- test_with_predicate() NoneView on GitHub
Verify that the matcher only matches instructions containing the exact predicate. Instructions without the predicate or with a different predicate should not match.
- class tests.test.sass.test_composite.TestAnyOfView on GitHub
Bases:
objectTests for
reprospect.test.sass.composite.any_of().- test() NoneView on GitHub
- class tests.test.sass.test_composite.TestFindallView on GitHub
Bases:
objectTests for
reprospect.test.sass.composite.findall().- test() NoneView on GitHub
- class tests.test.sass.test_composite.TestFinduniqueView on GitHub
Bases:
objectTests for
reprospect.test.sass.composite.findunique().- test() NoneView on GitHub
- class tests.test.sass.test_composite.TestInstructionCountIsView on GitHub
Bases:
objectTests for
reprospect.test.sass.composite.instruction_count_is().- test() NoneView on GitHub
- class tests.test.sass.test_composite.TestInstructionIsView on GitHub
Bases:
objectTests for
reprospect.test.sass.composite.instruction_is().- test() NoneView on GitHub
- test_one_or_more_times() NoneView on GitHub
- test_times_1() NoneView on GitHub
- test_times_2() NoneView on GitHub
- test_with_modifier() NoneView on GitHub
Test
reprospect.test.sass.composite.Fluentizer.with_modifier().
- test_with_operand() NoneView on GitHub
Test
reprospect.test.sass.composite.Fluentizer.with_operand().
- test_with_operand_composed() NoneView on GitHub
Similar to
test_with_operands()but callsreprospect.test.sass.composite.Fluentizer.with_operand()many times.
- test_with_operands() NoneView on GitHub
Test
reprospect.test.sass.composite.Fluentizer.with_operands().
- test_zero_or_more_time() NoneView on GitHub
- class tests.test.sass.test_composite.TestInstructionsAreView on GitHub
Bases:
objectTests for
reprospect.test.sass.composite.instructions_are().- test() NoneView on GitHub
- test_mix() NoneView on GitHub
- class tests.test.sass.test_composite.TestInstructionsContainView on GitHub
Bases:
objectTests for
reprospect.test.sass.composite.instructions_contain().- test() NoneView on GitHub
- class tests.test.sass.test_composite.TestInterleavedInstructionsAreView on GitHub
Bases:
objectTests for
reprospect.test.sass.composite.interleaved_instructions_are().- test() NoneView on GitHub
- class tests.test.sass.test_composite.TestUnorderedInstructionsAreView on GitHub
Bases:
objectTests for
reprospect.test.sass.composite.unordered_instructions_are().- test() NoneView on GitHub
- class tests.test.sass.test_composite.TestUnorderedInterleavedInstructionsAreView on GitHub
Bases:
objectTests for
reprospect.test.sass.composite.unordered_interleaved_instructions_are().- test() NoneView on GitHub
- tests.test.sass.test_composite_impl.DADD_DMUL = (Instruction(offset=0, instruction='DADD R4, R4, c[0x0][0x180]', hex='0x0', control=ControlCode(stall_count=1, yield_flag=True, read=7, write=0, wait=[False, False, False, False, False, False], reuse={'A': False, 'B': False, 'C': False, 'D': False})), Instruction(offset=0, instruction='DMUL R6, R6, c[0x0][0x188]', hex='0x1', control=ControlCode(stall_count=1, yield_flag=True, read=7, write=0, wait=[False, False, False, False, False, False], reuse={'A': False, 'B': False, 'C': False, 'D': False})))
One DADD followed by a DMUL.
- tests.test.sass.test_composite_impl.DADD_NOP_DMUL = (Instruction(offset=0, instruction='DADD R4, R4, c[0x0][0x180]', hex='0x0', control=ControlCode(stall_count=1, yield_flag=True, read=7, write=0, wait=[False, False, False, False, False, False], reuse={'A': False, 'B': False, 'C': False, 'D': False})), Instruction(offset=0, instruction='NOP', hex='0x2', control=ControlCode(stall_count=1, yield_flag=True, read=7, write=0, wait=[False, False, False, False, False, False], reuse={'A': False, 'B': False, 'C': False, 'D': False})), Instruction(offset=0, instruction='NOP', hex='0x2', control=ControlCode(stall_count=1, yield_flag=True, read=7, write=0, wait=[False, False, False, False, False, False], reuse={'A': False, 'B': False, 'C': False, 'D': False})), Instruction(offset=0, instruction='NOP', hex='0x2', control=ControlCode(stall_count=1, yield_flag=True, read=7, write=0, wait=[False, False, False, False, False, False], reuse={'A': False, 'B': False, 'C': False, 'D': False})), Instruction(offset=0, instruction='DMUL R6, R6, c[0x0][0x188]', hex='0x1', control=ControlCode(stall_count=1, yield_flag=True, read=7, write=0, wait=[False, False, False, False, False, False], reuse={'A': False, 'B': False, 'C': False, 'D': False})))
One DADD instruction followed by a few NOP, and a DMUL.
- tests.test.sass.test_composite_impl.NOP_DMUL_NOP_DADD = (Instruction(offset=0, instruction='NOP', hex='0x2', control=ControlCode(stall_count=1, yield_flag=True, read=7, write=0, wait=[False, False, False, False, False, False], reuse={'A': False, 'B': False, 'C': False, 'D': False})), Instruction(offset=0, instruction='DMUL R6, R6, c[0x0][0x188]', hex='0x1', control=ControlCode(stall_count=1, yield_flag=True, read=7, write=0, wait=[False, False, False, False, False, False], reuse={'A': False, 'B': False, 'C': False, 'D': False})), Instruction(offset=0, instruction='NOP', hex='0x2', control=ControlCode(stall_count=1, yield_flag=True, read=7, write=0, wait=[False, False, False, False, False, False], reuse={'A': False, 'B': False, 'C': False, 'D': False})), Instruction(offset=0, instruction='NOP', hex='0x2', control=ControlCode(stall_count=1, yield_flag=True, read=7, write=0, wait=[False, False, False, False, False, False], reuse={'A': False, 'B': False, 'C': False, 'D': False})), Instruction(offset=0, instruction='DADD R4, R4, c[0x0][0x180]', hex='0x0', control=ControlCode(stall_count=1, yield_flag=True, read=7, write=0, wait=[False, False, False, False, False, False], reuse={'A': False, 'B': False, 'C': False, 'D': False})))
NOP instructions with one DADD and one DMUL.
- class tests.test.sass.test_composite_impl.TestAllInSequenceMatcherView on GitHub
Bases:
objectTests for
reprospect.test.sass.composite_impl.AllInSequenceMatcher.- test_no_match() NoneView on GitHub
- test_sequence() NoneView on GitHub
The inner matcher is a
reprospect.test.sass.composite_impl.SequenceMatcher.
- test_single() NoneView on GitHub
The inner matcher is a
reprospect.test.sass.instruction.InstructionMatcher.
- class tests.test.sass.test_composite_impl.TestAnyOfMatcherView on GitHub
Bases:
objectTests for
reprospect.test.sass.composite_impl.AnyOfMatcher.- test_explain() NoneView on GitHub
- test_matches() NoneView on GitHub
- test_no_match() NoneView on GitHub
- class tests.test.sass.test_composite_impl.TestCountInSequenceMatcherView on GitHub
Bases:
objectTests for
reprospect.test.sass.composite_impl.CountInSequenceMatcher.- INNER: Final[OpcodeModsMatcher] = OpcodeModsMatcher(pattern=regex.Regex('(?P<opcode>YIELD)(?:\\.(?P<modifiers>[A-Z0-9_]+))*', flags=regex.V0))
- test_explain() NoneView on GitHub
- test_match() NoneView on GitHub
- test_no_match() NoneView on GitHub
- class tests.test.sass.test_composite_impl.TestInSequenceAtMatcherView on GitHub
Bases:
objectTests for
reprospect.test.sass.composite_impl.InSequenceAtMatcher.- MATCHER: Final[InSequenceAtMatcher] = <reprospect.test.sass.composite_impl.InSequenceAtMatcher object>
- test_explain() NoneView on GitHub
- test_match_first_element() NoneView on GitHub
Matches the first element.
- test_match_with_start() NoneView on GitHub
Matches the element pointed to by start.
- test_no_match() NoneView on GitHub
Raises if the first element does not match.
- class tests.test.sass.test_composite_impl.TestInSequenceMatcherView on GitHub
Bases:
objectTests for
reprospect.test.sass.composite_impl.InSequenceMatcher.- test_explain() NoneView on GitHub
- test_matches() NoneView on GitHub
- test_no_match() NoneView on GitHub
- class tests.test.sass.test_composite_impl.TestOneOrMoreInSequenceMatcherView on GitHub
Bases:
objectTests for
reprospect.test.sass.composite_impl.OneOrMoreInSequenceMatcher.- test_explain() NoneView on GitHub
- test_matches_more() NoneView on GitHub
Matches the sequence
DADD_NOP_DMULmany times starting at 1.
- test_matches_one() NoneView on GitHub
Matches the sequence
DADD_NOP_DMUL.
- test_matches_zero() NoneView on GitHub
Does not match sequence
DADD_NOP_DMUL.
- class tests.test.sass.test_composite_impl.TestOrderedInSequenceMatcherView on GitHub
Bases:
objectTests for
reprospect.test.sass.composite_impl.OrderedInSequenceMatcher.- MATCHER: Final[OrderedInSequenceMatcher] = <reprospect.test.sass.composite_impl.OrderedInSequenceMatcher object>
- test_explain() NoneView on GitHub
- test_match() NoneView on GitHub
Matches the sequence
DADD_DMUL.
- test_match_with_nop() NoneView on GitHub
Matches the sequence
DADD_NOP_DMUL.
- test_no_match() NoneView on GitHub
Does not match reversed sequence
DADD_DMUL.
- class tests.test.sass.test_composite_impl.TestOrderedInterleavedInSequenceMatcherView on GitHub
Bases:
objectTests for
reprospect.test.sass.composite_impl.OrderedInterleavedInSequenceMatcher.- INSTRUCTIONS_DADD: Final[tuple[str, ...]] = ('LDG.E.ENL2.256 R8, R4, desc[UR6][R2.64]', 'DADD R4, R4, UR12', 'NOP', 'NOP', 'NOP', 'NOP', 'DADD R6, R6, UR14', 'NOP', 'NOP', 'NOP', 'NOP', 'DADD R8, R8, UR16', 'NOP', 'NOP', 'NOP', 'NOP', 'DADD R10, R10, UR18', 'STG.E.ENL2.256 desc[UR6][R2.64], R4, R8')
- MATCHERS_DADD: Final[tuple[OpcodeModsWithOperandsMatcher, ...]] = (OpcodeModsWithOperandsMatcher(pattern=regex.Regex('(?P<opcode>DADD)\\s*(?P<operands>R4),?\\s*(?P<operands>R4),?\\s*(?P<operands>UR[0-9]+)', flags=regex.V0)), OpcodeModsWithOperandsMatcher(pattern=regex.Regex('(?P<opcode>DADD)\\s*(?P<operands>R6),?\\s*(?P<operands>R6),?\\s*(?P<operands>UR[0-9]+)', flags=regex.V0)), OpcodeModsWithOperandsMatcher(pattern=regex.Regex('(?P<opcode>DADD)\\s*(?P<operands>R8),?\\s*(?P<operands>R8),?\\s*(?P<operands>UR[0-9]+)', flags=regex.V0)), OpcodeModsWithOperandsMatcher(pattern=regex.Regex('(?P<opcode>DADD)\\s*(?P<operands>R10),?\\s*(?P<operands>R10),?\\s*(?P<operands>UR[0-9]+)', flags=regex.V0)))
- MATCHERS_LDG: Final[tuple[LoadGlobalMatcher, ...]] = (LoadGlobalMatcher(pattern=regex.Regex('(?P<opcode>LDG)\\.(?P<modifiers>E)\\.(?P<modifiers>U16)\\.(?P<modifiers>SYS) (?P<operands>R[0-9]+), (?P<address>(?P<operands>\\[R[0-9]+(?:\\+-?0x[0-9A-Fa-f]+)?\\]))', flags=regex.V0)), LoadGlobalMatcher(pattern=regex.Regex('(?P<opcode>LDG)\\.(?P<modifiers>E)\\.(?P<modifiers>U16)\\.(?P<modifiers>SYS) (?P<operands>R[0-9]+), (?P<address>(?P<operands>\\[R[0-9]+(?:\\+-?0x[0-9A-Fa-f]+)?\\]))', flags=regex.V0)))
- test_dadd() NoneView on GitHub
- test_ldg() NoneView on GitHub
- test_no_match() NoneView on GitHub
- class tests.test.sass.test_composite_impl.TestUnorderedInSequenceMatcherView on GitHub
Bases:
objectTests for
reprospect.test.sass.composite_impl.UnorderedInSequenceMatcher.- test_explain() NoneView on GitHub
- test_match_with_nop() NoneView on GitHub
Matches the sequence
DADD_NOP_DMUL.
- test_no_match() NoneView on GitHub
All permutations fail on sequence
NOP_DMUL_NOP_DADD.
- test_with_split_nop() NoneView on GitHub
Matches the sequence
NOP_DMUL_NOP_DADD.
- test_without_nop() NoneView on GitHub
Matches the sequence
DADD_DMUL.
- class tests.test.sass.test_composite_impl.TestUnorderedInterleavedInSequenceMatcherView on GitHub
Bases:
objectTests for
reprospect.test.sass.composite_impl.UnorderedInterleavedInSequenceMatcher.- test_dadd() NoneView on GitHub
- class tests.test.sass.test_composite_impl.TestZeroOrMoreInSequenceMatcherView on GitHub
Bases:
objectTests for
reprospect.test.sass.composite_impl.ZeroOrMoreInSequenceMatcher.- test_assert_matches_always_true() NoneView on GitHub
Matching zero time is fine.
- test_explain() NoneView on GitHub
- test_matches_with_start() NoneView on GitHub
Matches many times, with a start and sequence
DADD_NOP_DMUL.
- test_matches_zero() NoneView on GitHub
Matches zero time with sequence
DADD_NOP_DMUL.
- tests.test.sass.test_instruction.CODE_ELEMENTWISE_ADD_RESTRICT = '__global__ void elementwise_add_restrict(int* __restrict__ const dst, const int* __restrict__ const src) {\n const auto index = blockIdx.x * blockDim.x + threadIdx.x;\n dst[index] += src[index];\n}\n'
Element-wise add with 32-bit
int.
- tests.test.sass.test_instruction.CODE_ELEMENTWISE_ADD_RESTRICT_128_WIDE = '__global__ void elementwise_add_restrict_128_wide(float4* __restrict__ const dst, const float4* __restrict__ const src) {\n const auto index = blockIdx.x * blockDim.x + threadIdx.x;\n const float4& a = src[index];\n const float4& b = dst[index];\n dst[index] = make_float4(a.x + b.x, a.y + b.y, a.z + b.z, a.w + b.w);\n}\n'
Element-wise add with 128-bit
float4.
- tests.test.sass.test_instruction.CODE_ELEMENTWISE_ADD_RESTRICT_256_WIDE = 'struct alignas(4 * sizeof(double)) Tester {\n double x, y, z, w;\n};\n\n__global__ void elementwise_add_restrict_256_wide(Tester* __restrict__ const dst, const Tester* __restrict__ const src)\n{\n const auto index = blockIdx.x * blockDim.x + threadIdx.x;\n const Tester& a = src[index];\n const Tester& b = dst[index];\n dst[index] = Tester{.x = a.x + b.x, .y = a.y + b.y, .z = a.z + b.z, .w = a.w + b.w};\n}\n'
Element-wise add with 256-bit aligned elements.
- class tests.test.sass.test_instruction.TestOpcodeModsMatcherView on GitHub
Bases:
objectTests for
reprospect.test.sass.instruction.OpcodeModsMatcher.- test_with_descr()View on GitHub
- test_with_minus_sign()View on GitHub
- test_with_reuse()View on GitHub
- test_with_square_brackets()View on GitHub
- class tests.test.sass.test_instruction.TestOpcodeModsWithOperandsMatcherView on GitHub
Bases:
objectTests for
reprospect.test.sass.instruction.OpcodeModsWithOperandsMatcher.- test()View on GitHub
- class tests.test.sass.test_instruction.TestReductionMatcherView on GitHub
Bases:
objectTests for
reprospect.test.sass.instruction.atomic.ReductionMatcher.- CODE_ADD = '__global__ void add({type}* __restrict__ const dst, const {type}* __restrict__ const src)\n{{\n const auto index = blockIdx.x * blockDim.x + threadIdx.x;\n atomicAdd(&dst[index], src[index]);\n}}\n'
- CODE_MAX = '__global__ void max({type}* __restrict__ const dst, const {type}* __restrict__ const src)\n{{\n const auto index = blockIdx.x * blockDim.x + threadIdx.x;\n atomicMax(&dst[index], src[index]);\n}}\n'
- CODE_SUB = '__global__ void sub(int* __restrict__ const dst, const int* __restrict__ const src)\n{\n const auto index = blockIdx.x * blockDim.x + threadIdx.x;\n atomicSub(&dst[index], src[index]);\n}\n'
- pytestmark = [Mark(name='parametrize', args=('parameters', (Parameters(arch=NVIDIAArch(family=<NVIDIAFamily.VOLTA: 'VOLTA'>, compute_capability=ComputeCapability(major=7, minor=0))), Parameters(arch=NVIDIAArch(family=<NVIDIAFamily.TURING: 'TURING'>, compute_capability=ComputeCapability(major=7, minor=5))), Parameters(arch=NVIDIAArch(family=<NVIDIAFamily.AMPERE: 'AMPERE'>, compute_capability=ComputeCapability(major=8, minor=0))), Parameters(arch=NVIDIAArch(family=<NVIDIAFamily.AMPERE: 'AMPERE'>, compute_capability=ComputeCapability(major=8, minor=6))), Parameters(arch=NVIDIAArch(family=<NVIDIAFamily.ADA: 'ADA'>, compute_capability=ComputeCapability(major=8, minor=9))), Parameters(arch=NVIDIAArch(family=<NVIDIAFamily.HOPPER: 'HOPPER'>, compute_capability=ComputeCapability(major=9, minor=0))), Parameters(arch=NVIDIAArch(family=<NVIDIAFamily.BLACKWELL: 'BLACKWELL'>, compute_capability=ComputeCapability(major=10, minor=0))), Parameters(arch=NVIDIAArch(family=<NVIDIAFamily.BLACKWELL: 'BLACKWELL'>, compute_capability=ComputeCapability(major=12, minor=0))))), kwargs={'ids': <class 'str'>})]
- test_add_strong_device_double(request, workdir, parameters: Parameters, cmake_file_api: FileAPI)View on GitHub
Test with
CODE_ADDfor double.
- test_add_strong_device_float(request, workdir, parameters: Parameters, cmake_file_api: FileAPI)View on GitHub
Test with
CODE_ADDfor float.
- test_add_strong_device_int(request, workdir, parameters: Parameters, cmake_file_api: FileAPI)View on GitHub
Test with
CODE_ADDfor int.
- test_add_strong_device_unsigned_int(request, workdir, parameters: Parameters, cmake_file_api: FileAPI)View on GitHub
Test with
CODE_ADDfor unsigned int.
- test_add_strong_device_unsigned_long_long_int(request, workdir, parameters: Parameters, cmake_file_api: FileAPI)View on GitHub
Test with
CODE_ADDfor unsigned long long int.
- test_max_strong_device_int(request, workdir, parameters: Parameters, cmake_file_api: FileAPI)View on GitHub
Test with
CODE_MAXfor int. The modifier isMAX.S32.
- test_max_strong_device_long_long_int(request, workdir, parameters: Parameters, cmake_file_api: FileAPI)View on GitHub
Test with
CODE_MAXfor long long int. The modifier isMAX.S64.
- test_max_strong_device_unsigned_int(request, workdir, parameters: Parameters, cmake_file_api: FileAPI)View on GitHub
Test with
CODE_MAXfor unsigned int. The modifier isMAX.
- test_max_strong_device_unsigned_long_long_int(request, workdir, parameters: Parameters, cmake_file_api: FileAPI)View on GitHub
Test with
CODE_MAXfor unsigned long long int. The modifier isMAX.64.
- test_sub_strong_device(request, workdir, parameters: Parameters, cmake_file_api: FileAPI)View on GitHub
Test with
CODE_SUB.
- tests.test.sass.test_instruction.get_decoder(*, cwd: Path, arch: NVIDIAArch, file: Path, cmake_file_api: FileAPI, **kwargs) tuple[Decoder, Path]View on GitHub
Compile the code in file for arch and return a
reprospect.tools.sass.Decoder.
- class tests.test.sass.test_load.TestLoadConstantMatcherView on GitHub
Bases:
objectTests for
reprospect.test.sass.instruction.load.LoadConstantMatcher.- CODE_CONSTANT_ARRAY: Final[str] = '__constant__ {type} data[128];\n__global__ __launch_bounds__(128, 1) void ldc({type}* __restrict__ const out)\n{{\n const auto index = blockIdx.x * blockDim.x + threadIdx.x;\n out[index] = data[index];\n}}\n'
- INSTRUCTIONS: Final[dict[str, tuple[LoadConstantMatcher, InstructionMatch]]] = {'LDC R1, c[0x0][0x37c]': (LoadConstantMatcher(pattern=regex.Regex('(?P<opcode>(?:LDC|LDCU)) (?P<operands>(?:R[0-9]+|UR[0-9]+)), (?P<operands>(?:(?:!|\\-\\||\\-|\\~|\\|))?c\\[(?P<bank>0x[0-9]+)\\]\\[(?P<offset>(?:0x[0-9A-Fa-f]+|R(?:Z|\\d+)|UR[0-9]+))\\])', flags=regex.V0)), InstructionMatch(opcode='LDC', modifiers=(), operands=('R1', 'c[0x0][0x37c]'), predicate=None, additional={'bank': ['0x0'], 'offset': ['0x37c']})), 'LDC.64 R6, c[0x1][0x398]': (LoadConstantMatcher(pattern=regex.Regex('(?P<opcode>(?:LDC|LDCU))\\.(?P<modifiers>64) (?P<operands>(?:R[0-9]+|UR[0-9]+)), (?P<operands>(?:(?:!|\\-\\||\\-|\\~|\\|))?c\\[(?P<bank>0x[0-9]+)\\]\\[(?P<offset>(?:0x[0-9A-Fa-f]+|R(?:Z|\\d+)|UR[0-9]+))\\])', flags=regex.V0)), InstructionMatch(opcode='LDC', modifiers=('64',), operands=('R6', 'c[0x1][0x398]'), predicate=None, additional={'bank': ['0x1'], 'offset': ['0x398']})), 'LDCU UR4, c[0x2][0x364]': (LoadConstantMatcher(pattern=regex.Regex('(?P<opcode>LDCU) (?P<operands>UR[0-9]+), (?P<operands>(?:(?:!|\\-\\||\\-|\\~|\\|))?c\\[(?P<bank>0x[0-9]+)\\]\\[(?P<offset>(?:0x[0-9A-Fa-f]+|R(?:Z|\\d+)|UR[0-9]+))\\])', flags=regex.V0)), InstructionMatch(opcode='LDCU', modifiers=(), operands=('UR4', 'c[0x2][0x364]'), predicate=None, additional={'bank': ['0x2'], 'offset': ['0x364']})), 'LDCU UR4, c[0x3][UR0]': (LoadConstantMatcher(pattern=regex.Regex('(?P<opcode>LDCU) (?P<operands>UR[0-9]+), (?P<operands>(?:(?:!|\\-\\||\\-|\\~|\\|))?c\\[(?P<bank>0x[0-9]+)\\]\\[(?P<offset>(?:0x[0-9A-Fa-f]+|R(?:Z|\\d+)|UR[0-9]+))\\])', flags=regex.V0)), InstructionMatch(opcode='LDCU', modifiers=(), operands=('UR4', 'c[0x3][UR0]'), predicate=None, additional={'bank': ['0x3'], 'offset': ['UR0']})), 'LDCU.64 UR6, c[0x3][0x358]': (LoadConstantMatcher(pattern=regex.Regex('(?P<opcode>LDCU)\\.(?P<modifiers>64) (?P<operands>UR[0-9]+), (?P<operands>(?:(?:!|\\-\\||\\-|\\~|\\|))?c\\[(?P<bank>0x[0-9]+)\\]\\[(?P<offset>(?:0x[0-9A-Fa-f]+|R(?:Z|\\d+)|UR[0-9]+))\\])', flags=regex.V0)), InstructionMatch(opcode='LDCU', modifiers=('64',), operands=('UR6', 'c[0x3][0x358]'), predicate=None, additional={'bank': ['0x3'], 'offset': ['0x358']}))}
Zoo of real SASS instructions.
- test(instruction: str, matcher: LoadConstantMatcher, expected: InstructionMatch) NoneView on GitHub
- test_array_of_64bit_elements(request, workdir: Path, parameters: Parameters, cmake_file_api: FileAPI) NoneView on GitHub
Loads of size 64 with
CODE_CONSTANT_ARRAY.
- class tests.test.sass.test_load.TestLoadMatcherView on GitHub
Bases:
objectTests for
reprospect.test.sass.instruction.load.LoadMatcherandreprospect.test.sass.instruction.load.LoadGlobalMatcher.- CODE_ELEMENTWISE_ADD = '__global__ void elementwise_add(int* const dst, const int* const src) {\n const auto index = blockIdx.x * blockDim.x + threadIdx.x;\n dst[index] += src[index];\n}\n'
- CODE_ELEMENTWISE_ADD_LDG = '__global__ void elementwise_add_ldg(int* const dst, const int* const src) {\n const auto index = blockIdx.x * blockDim.x + threadIdx.x;\n dst[index] += __ldg(&src[index]);\n}\n'
- CODE_EXTEND = '#include <cstdint>\n\n__global__ void extend({dst}* {restrict} const dst, {src}* {restrict} const src, const unsigned int size)\n{{\n const auto index = blockIdx.x * blockDim.x + threadIdx.x;\n if (index < size) dst[index] = src[index];\n}}\n'
- test() NoneView on GitHub
- test_constant(request, workdir, parameters: Parameters, cmake_file_api: FileAPI) NoneView on GitHub
If src is declared
const __restrict__, the compiler is able to use the.CONSTANTmodifier. Otherwise, we need to explicitly use__ldgto end up using.CONSTANT.
- test_elementwise_add_restrict(request, workdir, parameters: Parameters, cmake_file_api: FileAPI) NoneView on GitHub
Test loads with
tests.test.sass.test_instruction.CODE_ELEMENTWISE_ADD_RESTRICT.
- test_elementwise_add_restrict_128_wide(request, workdir, parameters: Parameters, cmake_file_api: FileAPI) NoneView on GitHub
Test 128-bit wide loads with
tests.test.sass.test_instruction.CODE_ELEMENTWISE_ADD_RESTRICT_128_WIDE.
- test_elementwise_add_restrict_256_wide(request, workdir, parameters: Parameters, cmake_file_api: FileAPI) NoneView on GitHub
Test 256-bit wide loads with
tests.test.sass.test_instruction.CODE_ELEMENTWISE_ADD_RESTRICT_256_WIDE.
- test_sign_extend_s16(request, workdir: Path, parameters: Parameters, cmake_file_api: FileAPI) NoneView on GitHub
Check when
CODE_EXTENDleads to sign extension.
- test_zero_extend_u16(request, workdir: Path, parameters: Parameters, cmake_file_api: FileAPI) NoneView on GitHub
Use
CODE_EXTENDto enforce zero extension.
- class tests.test.sass.test_store.TestStoreMatcherView on GitHub
Bases:
objectTests for
reprospect.test.sass.instruction.store.StoreMatcherandreprospect.test.sass.instruction.store.StoreGlobalMatcher.- test() NoneView on GitHub
- test_elementwise_add_restrict(request, workdir, parameters: Parameters, cmake_file_api: FileAPI) NoneView on GitHub
Test store with
tests.test.sass.test_instruction.CODE_ELEMENTWISE_ADD_RESTRICT.
- test_elementwise_add_restrict_128_wide(request, workdir, parameters: Parameters, cmake_file_api: FileAPI) NoneView on GitHub
Test 128-bit wide store with
tests.test.sass.test_instruction.CODE_ELEMENTWISE_ADD_RESTRICT_128_WIDE.
- test_elementwise_add_restrict_256_wide(request, workdir, parameters: Parameters, cmake_file_api: FileAPI) NoneView on GitHub
Test 256-bit wide store with
tests.test.sass.test_instruction.CODE_ELEMENTWISE_ADD_RESTRICT_256_WIDE.