Test

Environment

class tests.test.test_environment.TestEnvironmentFieldView on GitHub

Bases: object

Tests for reprospect.test.environment.EnvironmentField.

test_converter_from_default_type(monkeypatch) NoneView on GitHub

If no converter was provided, infer it from the type of the default value.

test_in_environment_converted_no_env_key(monkeypatch) NoneView on GitHub

The attribute is correctly initialized from the environment (no key given), and converted.

test_in_environment_converted_with_env_key(monkeypatch) NoneView on GitHub

The value is correctly initialized from the environment (given a key), and converted.

test_no_attribute_name_or_env_key() NoneView on GitHub

Raises if neither an attribute name nor an environment key is given.

test_not_in_environment_no_default() NoneView on GitHub

The attribute cannot be initialized.

test_not_in_environment_use_default() NoneView on GitHub

The value is initialized to the given default value.

test_read_int_converter(monkeypatch) NoneView on GitHub

Environment variable read as int.

test_read_str_converter(monkeypatch) NoneView on GitHub

Environment variable read as str.

test_reset(monkeypatch) NoneView on GitHub
test_value_cached_at_class_level(monkeypatch) NoneView on GitHub

The value is shared among all instances.

SASS

Others

class tests.test.test_graph.TestGraphView on GitHub

Bases: CMakeAwareTestCase

General test class.

DEMANGLED_NODE_A: Final[dict[str, str]] = {'Clang': 'void add_and_increment_kernel<0u>(unsigned int*)', 'NVIDIA': 'void add_and_increment_kernel<(unsigned int)0, >(unsigned int *)'}
classmethod get_target_name() strView on GitHub
class tests.test.test_graph.TestNCUView on GitHub

Bases: TestGraph

ncu-focused analysis.

METRICS: Final[tuple[Metric]] = (Metric(name='launch__registers_per_thread_allocated', pretty_name='launch__registers_per_thread_allocated', subs=None),)
NVTX_INCLUDES: Final[tuple[str]] = ('application_domain@outer_useless_range',)
pytestmark = [Mark(name='skipif', args=(True,), kwargs={'reason': 'needs a GPU'})]
report(workdir: Path) ReportView on GitHub
results(report: Report) ProfilingResultsView on GitHub
test_launch_registers_per_thread_allocated_node_A(results: ProfilingResults) NoneView on GitHub

Check metric launch__registers_per_thread_allocated for graph node A.

test_result_count(report: Report, results: ProfilingResults) NoneView on GitHub

Check how many ranges and results there are in the report.

class tests.test.test_graph.TestSASSView on GitHub

Bases: TestGraph

SASS-focused analysis.

property cubin: strView on GitHub
cuobjdump() CuObjDumpView on GitHub
test_instruction_count(cuobjdump: CuObjDump) NoneView on GitHub

Check how many instructions there are in the first graph node kernel.

test_kernel_count(cuobjdump: CuObjDump) NoneView on GitHub

Count how many kernels there are (1 per graph node).

class tests.test.test_half.TestNCUView on GitHub

Bases: object

ncu-based analysis of the individual vs packed implementation.

BLOCK_DIM_X: Final[dict[str, int]] = {'individual': 129, 'packed': 65}
HALF: Final[Path] = PosixPath('tests/assets/tests_assets_half')
METRICS: Final[tuple[Metric | MetricCorrelation | MetricDeviceAttribute, ...]] = (MetricDeviceAttribute(name='display_name'), MetricCounter(name='smsp__sass_inst_executed_op_global_ld', pretty_name='L1/TEX cache global load instructions sass', subs=(<MetricCounterRollUp.SUM: 'sum'>,)), MetricCounter(name='l1tex__t_requests_pipe_lsu_mem_global_op_ld', pretty_name='L1/TEX cache global load requests', subs=(<MetricCounterRollUp.SUM: 'sum'>,)), MetricCounter(name='l1tex__t_sectors_pipe_lsu_mem_global_op_ld', pretty_name='L1/TEX cache global load sectors', subs=(<MetricCounterRollUp.SUM: 'sum'>,)), Metric(name='launch__grid_dim_x', pretty_name='launch__grid_dim_x', subs=None), Metric(name='launch__grid_dim_y', pretty_name='launch__grid_dim_y', subs=None), Metric(name='launch__grid_dim_z', pretty_name='launch__grid_dim_z', subs=None), Metric(name='launch__block_dim_x', pretty_name='launch__block_dim_x', subs=None), Metric(name='launch__block_dim_y', pretty_name='launch__block_dim_y', subs=None), Metric(name='launch__block_dim_z', pretty_name='launch__block_dim_z', subs=None))
SIZE: Final[int] = 129

Buffer size.

SIZEOF: Final[int] = 2

Size of __half in bytes.

WARP_SIZE: Final[int] = 32
pytestmark = [Mark(name='skipif', args=(True,), kwargs={'reason': 'needs a GPU'})]
results(workdir: Path, bindir: Path) ProfilingResultsView on GitHub
test_memory(results: ProfilingResults) NoneView on GitHub

Compare the memory traffic.

class tests.test.test_half.TestSASSView on GitHub

Bases: object

Tests that combine different half-precision SASS instructions.

FILE: Final[Path] = PosixPath('/__w/reprospect/reprospect/tests/assets/test_half.cu')
cuobjdump(workdir: Path, parameters: Parameters, cmake_file_api: FileAPI) CuObjDumpView on GitHub
pytestmark = [Mark(name='parametrize', args=('parameters', (Parameters(arch=NVIDIAArch(family=<NVIDIAFamily.VOLTA: 'VOLTA'>, compute_capability=ComputeCapability(major=7, minor=0))), Parameters(arch=NVIDIAArch(family=<NVIDIAFamily.TURING: 'TURING'>, compute_capability=ComputeCapability(major=7, minor=5))), Parameters(arch=NVIDIAArch(family=<NVIDIAFamily.AMPERE: 'AMPERE'>, compute_capability=ComputeCapability(major=8, minor=0))), Parameters(arch=NVIDIAArch(family=<NVIDIAFamily.AMPERE: 'AMPERE'>, compute_capability=ComputeCapability(major=8, minor=6))), Parameters(arch=NVIDIAArch(family=<NVIDIAFamily.ADA: 'ADA'>, compute_capability=ComputeCapability(major=8, minor=9))), Parameters(arch=NVIDIAArch(family=<NVIDIAFamily.HOPPER: 'HOPPER'>, compute_capability=ComputeCapability(major=9, minor=0))), Parameters(arch=NVIDIAArch(family=<NVIDIAFamily.BLACKWELL: 'BLACKWELL'>, compute_capability=ComputeCapability(major=10, minor=0))), Parameters(arch=NVIDIAArch(family=<NVIDIAFamily.BLACKWELL: 'BLACKWELL'>, compute_capability=ComputeCapability(major=12, minor=0))))), kwargs={'ids': <class 'str'>, 'scope': 'class'})]
test_individual(parameters: Parameters, cuobjdump: CuObjDump) NoneView on GitHub

Analyse the individual implementation.

It loads only 16 bits at once and does a “broadcast” of the lower lane (H0_H0) because HMUL2 works with 2 lanes (packed instruction).

Typically:

LDG.E.U16.CONSTANT.SYS R2, [R2]
HMUL2 R0, R2.H0_H0, R2.H0_H0
STG.E.U16.SYS [R4], R0
test_packed(parameters: Parameters, cuobjdump: CuObjDump) NoneView on GitHub

Analyse the packed implementation.

First, there is a block that performs the “odd” element and therefore looks like the individual implementation:

LDG.E.U16.CONSTANT.SYS R2, [R2]
HMUL2 R0, R2.H0_H0, R2.H0_H0

Then, there is another block that performs the packed multiplication. It loads 32 bits at once. Typically:

LDG.E.CONSTANT.SYS R2, [R2]
HMUL2 R7, R2, R2

Note that, even though the PTX always reads:

mul.f16x2 %r8,%r9,%r9

ptxas may choose to use HFMA2:

HFMA2 R7, R2, R2, -RZ

or even HFMA2.MMA:

HFMA2.MMA R7, R2, R2, -RZ

instead of HMUL2, depending on the targeted architecture.

class tests.test.test_saxpy.TestNCUView on GitHub

Bases: TestSaxpy

ncu-focused analysis.

METRICS: Final[tuple[Metric]] = (Metric(name='launch__registers_per_thread_allocated', pretty_name='launch__registers_per_thread_allocated', subs=None),)
NVTX_INCLUDES: Final[tuple[str, ...]] = ('application_domain@launch_saxpy_kernel_first_time/', 'application_domain@launch_saxpy_kernel_second_time/')
pytestmark = [Mark(name='skipif', args=(True,), kwargs={'reason': 'needs a GPU'})]
report(workdir: Path) ReportView on GitHub
results(report: Report) ProfilingResultsView on GitHub
test_result_count(report: Report, results: ProfilingResults) NoneView on GitHub

Check how many ranges and results there are in the report.

class tests.test.test_saxpy.TestSASSView on GitHub

Bases: TestSaxpy

SASS-focused analysis.

property cubin: PathView on GitHub
cuobjdump() CuObjDumpView on GitHub
decoder(cuobjdump: CuObjDump) DecoderView on GitHub
test_instruction_count(decoder: Decoder) NoneView on GitHub
class tests.test.test_saxpy.TestSaxpyView on GitHub

Bases: CMakeAwareTestCase

General test class.

NAME: Final[str] = 'tests_assets_saxpy'
TARGET_SOURCE: Final[Path] = PosixPath('tests/assets/test_saxpy.cpp')
classmethod get_target_name() strView on GitHub