Test
Environment
- class tests.test.test_environment.TestEnvironmentFieldView on GitHub
Bases:
objectTests for
reprospect.test.environment.EnvironmentField.- test_converter_from_default_type(monkeypatch) NoneView on GitHub
If no converter was provided, infer it from the type of the default value.
- test_in_environment_converted_no_env_key(monkeypatch) NoneView on GitHub
The attribute is correctly initialized from the environment (no key given), and converted.
- test_in_environment_converted_with_env_key(monkeypatch) NoneView on GitHub
The value is correctly initialized from the environment (given a key), and converted.
- test_no_attribute_name_or_env_key() NoneView on GitHub
Raises if neither an attribute name nor an environment key is given.
- test_not_in_environment_no_default() NoneView on GitHub
The attribute cannot be initialized.
- test_not_in_environment_use_default() NoneView on GitHub
The value is initialized to the given default value.
- test_read_int_converter(monkeypatch) NoneView on GitHub
Environment variable read as int.
- test_read_str_converter(monkeypatch) NoneView on GitHub
Environment variable read as str.
- test_reset(monkeypatch) NoneView on GitHub
- test_value_cached_at_class_level(monkeypatch) NoneView on GitHub
The value is shared among all instances.
SASS
Others
- class tests.test.test_graph.TestGraphView on GitHub
Bases:
CMakeAwareTestCaseGeneral test class.
- DEMANGLED_NODE_A: Final[dict[str, str]] = {'Clang': 'void add_and_increment_kernel<0u>(unsigned int*)', 'NVIDIA': 'void add_and_increment_kernel<(unsigned int)0, >(unsigned int *)'}
- classmethod get_target_name() strView on GitHub
- class tests.test.test_graph.TestNCUView on GitHub
Bases:
TestGraphncu-focused analysis.
- METRICS: Final[tuple[Metric]] = (Metric(name='launch__registers_per_thread_allocated', pretty_name='launch__registers_per_thread_allocated', subs=None),)
- pytestmark = [Mark(name='skipif', args=(True,), kwargs={'reason': 'needs a GPU'})]
- report(workdir: Path) ReportView on GitHub
- results(report: Report) ProfilingResultsView on GitHub
- test_launch_registers_per_thread_allocated_node_A(results: ProfilingResults) NoneView on GitHub
Check metric launch__registers_per_thread_allocated for graph node A.
- test_result_count(report: Report, results: ProfilingResults) NoneView on GitHub
Check how many ranges and results there are in the report.
- class tests.test.test_graph.TestSASSView on GitHub
Bases:
TestGraphSASS-focused analysis.
- property cubin: strView on GitHub
- cuobjdump() CuObjDumpView on GitHub
- test_instruction_count(cuobjdump: CuObjDump) NoneView on GitHub
Check how many instructions there are in the first graph node kernel.
- test_kernel_count(cuobjdump: CuObjDump) NoneView on GitHub
Count how many kernels there are (1 per graph node).
- class tests.test.test_half.TestNCUView on GitHub
Bases:
objectncu-based analysis of the individual vs packed implementation.
- METRICS: Final[tuple[Metric | MetricCorrelation | MetricDeviceAttribute, ...]] = (MetricDeviceAttribute(name='display_name'), MetricCounter(name='smsp__sass_inst_executed_op_global_ld', pretty_name='L1/TEX cache global load instructions sass', subs=(<MetricCounterRollUp.SUM: 'sum'>,)), MetricCounter(name='l1tex__t_requests_pipe_lsu_mem_global_op_ld', pretty_name='L1/TEX cache global load requests', subs=(<MetricCounterRollUp.SUM: 'sum'>,)), MetricCounter(name='l1tex__t_sectors_pipe_lsu_mem_global_op_ld', pretty_name='L1/TEX cache global load sectors', subs=(<MetricCounterRollUp.SUM: 'sum'>,)), Metric(name='launch__grid_dim_x', pretty_name='launch__grid_dim_x', subs=None), Metric(name='launch__grid_dim_y', pretty_name='launch__grid_dim_y', subs=None), Metric(name='launch__grid_dim_z', pretty_name='launch__grid_dim_z', subs=None), Metric(name='launch__block_dim_x', pretty_name='launch__block_dim_x', subs=None), Metric(name='launch__block_dim_y', pretty_name='launch__block_dim_y', subs=None), Metric(name='launch__block_dim_z', pretty_name='launch__block_dim_z', subs=None))
- pytestmark = [Mark(name='skipif', args=(True,), kwargs={'reason': 'needs a GPU'})]
- results(workdir: Path, bindir: Path) ProfilingResultsView on GitHub
- test_memory(results: ProfilingResults) NoneView on GitHub
Compare the memory traffic.
- class tests.test.test_half.TestSASSView on GitHub
Bases:
objectTests that combine different half-precision SASS instructions.
- cuobjdump(workdir: Path, parameters: Parameters, cmake_file_api: FileAPI) CuObjDumpView on GitHub
- pytestmark = [Mark(name='parametrize', args=('parameters', (Parameters(arch=NVIDIAArch(family=<NVIDIAFamily.VOLTA: 'VOLTA'>, compute_capability=ComputeCapability(major=7, minor=0))), Parameters(arch=NVIDIAArch(family=<NVIDIAFamily.TURING: 'TURING'>, compute_capability=ComputeCapability(major=7, minor=5))), Parameters(arch=NVIDIAArch(family=<NVIDIAFamily.AMPERE: 'AMPERE'>, compute_capability=ComputeCapability(major=8, minor=0))), Parameters(arch=NVIDIAArch(family=<NVIDIAFamily.AMPERE: 'AMPERE'>, compute_capability=ComputeCapability(major=8, minor=6))), Parameters(arch=NVIDIAArch(family=<NVIDIAFamily.ADA: 'ADA'>, compute_capability=ComputeCapability(major=8, minor=9))), Parameters(arch=NVIDIAArch(family=<NVIDIAFamily.HOPPER: 'HOPPER'>, compute_capability=ComputeCapability(major=9, minor=0))), Parameters(arch=NVIDIAArch(family=<NVIDIAFamily.BLACKWELL: 'BLACKWELL'>, compute_capability=ComputeCapability(major=10, minor=0))), Parameters(arch=NVIDIAArch(family=<NVIDIAFamily.BLACKWELL: 'BLACKWELL'>, compute_capability=ComputeCapability(major=12, minor=0))))), kwargs={'ids': <class 'str'>, 'scope': 'class'})]
- test_individual(parameters: Parameters, cuobjdump: CuObjDump) NoneView on GitHub
Analyse the individual implementation.
It loads only 16 bits at once and does a “broadcast” of the lower lane (
H0_H0) becauseHMUL2works with 2 lanes (packed instruction).Typically:
LDG.E.U16.CONSTANT.SYS R2, [R2] HMUL2 R0, R2.H0_H0, R2.H0_H0 STG.E.U16.SYS [R4], R0
- test_packed(parameters: Parameters, cuobjdump: CuObjDump) NoneView on GitHub
Analyse the packed implementation.
First, there is a block that performs the “odd” element and therefore looks like the individual implementation:
LDG.E.U16.CONSTANT.SYS R2, [R2] HMUL2 R0, R2.H0_H0, R2.H0_H0
Then, there is another block that performs the packed multiplication. It loads 32 bits at once. Typically:
LDG.E.CONSTANT.SYS R2, [R2] HMUL2 R7, R2, R2
Note that, even though the PTX always reads:
mul.f16x2 %r8,%r9,%r9
ptxasmay choose to useHFMA2:HFMA2 R7, R2, R2, -RZ
or even
HFMA2.MMA:HFMA2.MMA R7, R2, R2, -RZ
instead of
HMUL2, depending on the targeted architecture.
- class tests.test.test_saxpy.TestNCUView on GitHub
Bases:
TestSaxpyncu-focused analysis.
- METRICS: Final[tuple[Metric]] = (Metric(name='launch__registers_per_thread_allocated', pretty_name='launch__registers_per_thread_allocated', subs=None),)
- NVTX_INCLUDES: Final[tuple[str, ...]] = ('application_domain@launch_saxpy_kernel_first_time/', 'application_domain@launch_saxpy_kernel_second_time/')
- pytestmark = [Mark(name='skipif', args=(True,), kwargs={'reason': 'needs a GPU'})]
- report(workdir: Path) ReportView on GitHub
- results(report: Report) ProfilingResultsView on GitHub
- test_result_count(report: Report, results: ProfilingResults) NoneView on GitHub
Check how many ranges and results there are in the report.
- class tests.test.test_saxpy.TestSASSView on GitHub
Bases:
TestSaxpySASS-focused analysis.
- property cubin: PathView on GitHub
- cuobjdump() CuObjDumpView on GitHub
- decoder(cuobjdump: CuObjDump) DecoderView on GitHub
- test_instruction_count(decoder: Decoder) NoneView on GitHub
- class tests.test.test_saxpy.TestSaxpyView on GitHub
Bases:
CMakeAwareTestCaseGeneral test class.
- classmethod get_target_name() strView on GitHub