reprospect.test.features module

This module provides helpers for architecture and CUDA version-dependent features to enable feature-driven testing. It covers features that are:

  • Well-documented by NVIDIA

    Official features with clear documentation, provided here for convenience.

  • Under-documented

    Features mentioned in release notes or vendor communication but lacking comprehensive documentation.

  • Undocumented

    Features discovered through empirical testing, profiling, or community knowledge.

class reprospect.test.features.Memory(*, arch: reprospect.tools.architecture.NVIDIAArch, version: semantic_version.Version = <factory>)View on GitHub

Bases: object

__init__(*, arch: ~reprospect.tools.architecture.NVIDIAArch, version: ~semantic_version.Version = <factory>) None
arch: NVIDIAArch
property max_transaction_size: intView on GitHub

Maximum memory transaction size in bytes for load/store operations.

Prior to reprospect.tools.architecture.NVIDIAFamily.BLACKWELL and CUDA 13, a load/store of 32-byte aligned data requires two 16-byte transactions/instructions.

Starting from reprospect.tools.architecture.NVIDIAFamily.BLACKWELL and CUDA 13, 32-byte aligned data can be loaded/stored in a single transaction/instruction.

>>> from semantic_version import Version
>>> from reprospect.test.features import Memory
>>> from reprospect.tools.architecture import NVIDIAArch
>>> Memory(arch=NVIDIAArch.from_compute_capability(100), version=Version('13.0.0')).max_transaction_size
32
>>> Memory(arch=NVIDIAArch.from_compute_capability(90), version=Version('13.0.0')).max_transaction_size
16

References:

sign_extension(compiler_id: str) boolView on GitHub

When loading a 16-bit signed value into a 32-bit register, compilers may use either sign-extending or zero-extending loads:

  • nvcc may use either approach.

  • clang always uses sign extension.

Sign extension can be performed by the load instruction itself:

LDG.E.S16.CONSTANT R3, desc[UR4][R2.64]
...
STG.E desc[UR4][R4.64], R3

or by a subsequent PRMT instruction after a zero-extending load:

LDG.E.U16.CONSTANT.SYS R2, [R2]
PRMT R7, R2, 0x9910, RZ
STG.E.SYS [R4], R7
Returns:

True if the load instruction uses sign extension.

version: Version
class reprospect.test.features.PTX(*, arch: reprospect.tools.architecture.NVIDIAArch)View on GitHub

Bases: object

__init__(*, arch: NVIDIAArch) None
arch: NVIDIAArch
property min_isa_version: VersionView on GitHub

Minimum PTX ISA version that supports arch.

References: