Test

Environment

class tests.test.test_environment.TestEnvironmentFieldView on GitHub 

Bases: object

Tests for reprospect.test.environment.EnvironmentField.

test_converter_from_default_type(monkeypatch) → NoneView on GitHub : If no converter was provided, infer it from the type of the default value.

test_in_environment_converted_no_env_key(monkeypatch) → NoneView on GitHub : The attribute is correctly initialized from the environment (no key given), and converted.

test_in_environment_converted_with_env_key(monkeypatch) → NoneView on GitHub : The value is correctly initialized from the environment (given a key), and converted.

test_no_attribute_name_or_env_key() → NoneView on GitHub : Raises if neither an attribute name nor an environment key is given.

test_not_in_environment_no_default() → NoneView on GitHub : The attribute cannot be initialized.

test_not_in_environment_use_default() → NoneView on GitHub : The value is initialized to the given default value.

test_read_int_converter(monkeypatch) → NoneView on GitHub : Environment variable read as int.

test_read_str_converter(monkeypatch) → NoneView on GitHub : Environment variable read as str.

test_reset(monkeypatch) → NoneView on GitHub 

test_value_cached_at_class_level(monkeypatch) → NoneView on GitHub : The value is shared among all instances.

SASS

SASS

Others

class tests.test.test_graph.TestGraphView on GitHub 

Bases: CMakeAwareTestCase

General test class.

DEMANGLED_NODE_A: Final[dict[str, str]] = {'Clang': 'void add_and_increment_kernel<0u>(unsigned int*)', 'NVIDIA': 'void add_and_increment_kernel<(unsigned int)0, >(unsigned int *)'}

classmethod get_target_name() → strView on GitHub 

class tests.test.test_graph.TestNCUView on GitHub 

Bases: TestGraph

ncu-focused analysis.

METRICS: Final[tuple[Metric]] = (Metric(name='launch__registers_per_thread_allocated', pretty_name='launch__registers_per_thread_allocated', subs=None),)

NVTX_INCLUDES: Final[tuple[str]] = ('application_domain@outer_useless_range',)

pytestmark = [Mark(name='skipif', args=(True,), kwargs={'reason': 'needs a GPU'})]

report(workdir: Path) → ReportView on GitHub 

results(report: Report) → ProfilingResultsView on GitHub 

test_launch_registers_per_thread_allocated_node_A(results: ProfilingResults) → NoneView on GitHub : Check metric launch__registers_per_thread_allocated for graph node A.

test_result_count(report: Report, results: ProfilingResults) → NoneView on GitHub : Check how many ranges and results there are in the report.

class tests.test.test_graph.TestSASSView on GitHub 

Bases: TestGraph

SASS-focused analysis.

property cubin: strView on GitHub 

cuobjdump() → CuObjDumpView on GitHub 

test_instruction_count(cuobjdump: CuObjDump) → NoneView on GitHub : Check how many instructions there are in the first graph node kernel.

test_kernel_count(cuobjdump: CuObjDump) → NoneView on GitHub : Count how many kernels there are (1 per graph node).

class tests.test.test_half.TestNCUView on GitHub 

Bases: object

ncu-based analysis of the individual vs packed implementation.

BLOCK_DIM_X: Final[dict[str, int]] = {'individual': 129, 'packed': 65}

HALF: Final[Path] = PosixPath('tests/assets/tests_assets_half')

METRICS: Final[tuple[Metric | MetricCorrelation | MetricDeviceAttribute, ...]] = (MetricDeviceAttribute(name='display_name'), MetricCounter(name='smsp__sass_inst_executed_op_global_ld', pretty_name='L1/TEX cache global load instructions sass', subs=(<MetricCounterRollUp.SUM: 'sum'>,)), MetricCounter(name='l1tex__t_requests_pipe_lsu_mem_global_op_ld', pretty_name='L1/TEX cache global load requests', subs=(<MetricCounterRollUp.SUM: 'sum'>,)), MetricCounter(name='l1tex__t_sectors_pipe_lsu_mem_global_op_ld', pretty_name='L1/TEX cache global load sectors', subs=(<MetricCounterRollUp.SUM: 'sum'>,)), Metric(name='launch__grid_dim_x', pretty_name='launch__grid_dim_x', subs=None), Metric(name='launch__grid_dim_y', pretty_name='launch__grid_dim_y', subs=None), Metric(name='launch__grid_dim_z', pretty_name='launch__grid_dim_z', subs=None), Metric(name='launch__block_dim_x', pretty_name='launch__block_dim_x', subs=None), Metric(name='launch__block_dim_y', pretty_name='launch__block_dim_y', subs=None), Metric(name='launch__block_dim_z', pretty_name='launch__block_dim_z', subs=None))

SIZE: Final[int] = 129: Buffer size.

SIZEOF: Final[int] = 2: Size of __half in bytes.

WARP_SIZE: Final[int] = 32

pytestmark = [Mark(name='skipif', args=(True,), kwargs={'reason': 'needs a GPU'})]

results(workdir: Path, bindir: Path) → ProfilingResultsView on GitHub 

test_memory(results: ProfilingResults) → NoneView on GitHub : Compare the memory traffic.

class tests.test.test_half.TestSASSView on GitHub 

Bases: object

Tests that combine different half-precision SASS instructions.

FILE: Final[Path] = PosixPath('/__w/reprospect/reprospect/tests/assets/test_half.cu')

cuobjdump(workdir: Path, parameters: Parameters, cmake_file_api: FileAPI) → CuObjDumpView on GitHub 

pytestmark = [Mark(name='parametrize', args=('parameters', (Parameters(arch=NVIDIAArch(family=<NVIDIAFamily.VOLTA: 'VOLTA'>, compute_capability=ComputeCapability(major=7, minor=0))), Parameters(arch=NVIDIAArch(family=<NVIDIAFamily.TURING: 'TURING'>, compute_capability=ComputeCapability(major=7, minor=5))), Parameters(arch=NVIDIAArch(family=<NVIDIAFamily.AMPERE: 'AMPERE'>, compute_capability=ComputeCapability(major=8, minor=0))), Parameters(arch=NVIDIAArch(family=<NVIDIAFamily.AMPERE: 'AMPERE'>, compute_capability=ComputeCapability(major=8, minor=6))), Parameters(arch=NVIDIAArch(family=<NVIDIAFamily.ADA: 'ADA'>, compute_capability=ComputeCapability(major=8, minor=9))), Parameters(arch=NVIDIAArch(family=<NVIDIAFamily.HOPPER: 'HOPPER'>, compute_capability=ComputeCapability(major=9, minor=0))), Parameters(arch=NVIDIAArch(family=<NVIDIAFamily.BLACKWELL: 'BLACKWELL'>, compute_capability=ComputeCapability(major=10, minor=0))), Parameters(arch=NVIDIAArch(family=<NVIDIAFamily.BLACKWELL: 'BLACKWELL'>, compute_capability=ComputeCapability(major=12, minor=0))))), kwargs={'ids': <class 'str'>, 'scope': 'class'})]

test_individual(parameters: Parameters, cuobjdump: CuObjDump) → NoneView on GitHub 

Analyse the individual implementation.

It loads only 16 bits at once and does a “broadcast” of the lower lane (H0_H0) because HMUL2 works with 2 lanes (packed instruction).

Typically:

LDG.E.U16.CONSTANT.SYS R2, [R2]
HMUL2 R0, R2.H0_H0, R2.H0_H0
STG.E.U16.SYS [R4], R0

test_packed(parameters: Parameters, cuobjdump: CuObjDump) → NoneView on GitHub 

Analyse the packed implementation.

First, there is a block that performs the “odd” element and therefore looks like the individual implementation:

LDG.E.U16.CONSTANT.SYS R2, [R2]
HMUL2 R0, R2.H0_H0, R2.H0_H0

Then, there is another block that performs the packed multiplication. It loads 32 bits at once. Typically:

LDG.E.CONSTANT.SYS R2, [R2]
HMUL2 R7, R2, R2

Note that, even though the PTX always reads:

mul.f16x2 %r8,%r9,%r9

ptxas may choose to use HFMA2:

HFMA2 R7, R2, R2, -RZ

or even HFMA2.MMA:

HFMA2.MMA R7, R2, R2, -RZ

instead of HMUL2, depending on the targeted architecture.

class tests.test.test_saxpy.TestNCUView on GitHub 

Bases: TestSaxpy

ncu-focused analysis.

METRICS: Final[tuple[Metric]] = (Metric(name='launch__registers_per_thread_allocated', pretty_name='launch__registers_per_thread_allocated', subs=None),)

NVTX_INCLUDES: Final[tuple[str, ...]] = ('application_domain@launch_saxpy_kernel_first_time/', 'application_domain@launch_saxpy_kernel_second_time/')

pytestmark = [Mark(name='skipif', args=(True,), kwargs={'reason': 'needs a GPU'})]

report(workdir: Path) → ReportView on GitHub 

results(report: Report) → ProfilingResultsView on GitHub 

test_result_count(report: Report, results: ProfilingResults) → NoneView on GitHub : Check how many ranges and results there are in the report.

class tests.test.test_saxpy.TestSASSView on GitHub 

Bases: TestSaxpy

SASS-focused analysis.

property cubin: PathView on GitHub 

cuobjdump() → CuObjDumpView on GitHub 

decoder(cuobjdump: CuObjDump) → DecoderView on GitHub 

test_instruction_count(decoder: Decoder) → NoneView on GitHub 

class tests.test.test_saxpy.TestSaxpyView on GitHub 

Bases: CMakeAwareTestCase

General test class.

NAME: Final[str] = 'tests_assets_saxpy'

TARGET_SOURCE: Final[Path] = PosixPath('tests/assets/test_saxpy.cpp')

classmethod get_target_name() → strView on GitHub 