reprospect.test.features module
This module provides helpers for architecture and CUDA version-dependent features to enable feature-driven testing. It covers features that are:
Well-documented by NVIDIA
Official features with clear documentation, provided here for convenience.
Under-documented
Features mentioned in release notes or vendor communication but lacking comprehensive documentation.
Undocumented
Features discovered through empirical testing, profiling, or community knowledge.
- class reprospect.test.features.Memory(*, arch: reprospect.tools.architecture.NVIDIAArch, version: semantic_version.Version = <factory>)View on GitHub
Bases:
object- __init__(*, arch: ~reprospect.tools.architecture.NVIDIAArch, version: ~semantic_version.Version = <factory>) None
- arch: NVIDIAArch
- property max_transaction_size: intView on GitHub
Maximum memory transaction size in bytes for load/store operations.
Prior to
reprospect.tools.architecture.NVIDIAFamily.BLACKWELLand CUDA 13, a load/store of 32-byte aligned data requires two 16-byte transactions/instructions.Starting from
reprospect.tools.architecture.NVIDIAFamily.BLACKWELLand CUDA 13, 32-byte aligned data can be loaded/stored in a single transaction/instruction.>>> from semantic_version import Version >>> from reprospect.test.features import Memory >>> from reprospect.tools.architecture import NVIDIAArch >>> Memory(arch=NVIDIAArch.from_compute_capability(100), version=Version('13.0.0')).max_transaction_size 32 >>> Memory(arch=NVIDIAArch.from_compute_capability(90), version=Version('13.0.0')).max_transaction_size 16
References:
- sign_extension(compiler_id: str) boolView on GitHub
When loading a 16-bit signed value into a 32-bit register, compilers may use either sign-extending or zero-extending loads:
nvccmay use either approach.clangalways uses sign extension.
Sign extension can be performed by the load instruction itself:
LDG.E.S16.CONSTANT R3, desc[UR4][R2.64] ... STG.E desc[UR4][R4.64], R3
or by a subsequent
PRMTinstruction after a zero-extending load:LDG.E.U16.CONSTANT.SYS R2, [R2] PRMT R7, R2, 0x9910, RZ STG.E.SYS [R4], R7
- Returns:
Trueif the load instruction uses sign extension.
- class reprospect.test.features.PTX(*, arch: reprospect.tools.architecture.NVIDIAArch)View on GitHub
Bases:
object- __init__(*, arch: NVIDIAArch) None
- arch: NVIDIAArch
- property min_isa_version: VersionView on GitHub
Minimum PTX ISA version that supports
arch.References: