reprospect.tools.binaries package
- class reprospect.tools.binaries.CuObjDump(file: ~pathlib.Path, arch: ~reprospect.tools.architecture.NVIDIAArch, *, sass: bool = True, demangler: type[~reprospect.tools.binaries.demangle.CuppFilt | ~reprospect.tools.binaries.demangle.LlvmCppFilt] | None = <class 'reprospect.tools.binaries.demangle.CuppFilt'>, keep: ~typing.Iterable[str] | None = None)View on GitHub
Bases:
objectUse
cuobjdumpfor extracting SASS, symbol table, and so on.References:
[NVId]
- __init__(file: ~pathlib.Path, arch: ~reprospect.tools.architecture.NVIDIAArch, *, sass: bool = True, demangler: type[~reprospect.tools.binaries.demangle.CuppFilt | ~reprospect.tools.binaries.demangle.LlvmCppFilt] | None = <class 'reprospect.tools.binaries.demangle.CuppFilt'>, keep: ~typing.Iterable[str] | None = None) NoneView on GitHub
- Parameters:
file – Either a host binary file containing one or more embedded CUDA binary files, or itself a CUDA binary file.
keep – Optionally filter the functions to be kept.
- __str__() strView on GitHub
Rich representation.
- arch: Final[NVIDIAArch]
The NVIDIA architecture.
- property embedded_cubins: tuple[str, ...]View on GitHub
Get the names of the embedded CUDA binary files contained in
file.
- classmethod extract(*, file: Path, arch: NVIDIAArch, cwd: Path, cubin: str, **kwargs) tuple[CuObjDump, Path]View on GitHub
Extract the embedded CUDA binary file whose name contains cubin, from file, for the given arch.
The file can be inspected with the following command to list all ELF files:
cuobjdump --list-elf <file>
Note that extracting an embedded CUDA binary from a file so as to extract a specific subset of the SASS can be significantly faster than extracting all the SASS straightforwardly from the whole file.
- static extract_elf(*, file: Path, cwd: Path | None = None, arch: NVIDIAArch | None = None, name: str | None = None) Generator[str, None, None]View on GitHub
Extract ELF files from file.
- Parameters:
arch – Optionally filter for a given architecture.
name – Optionally filter by name.
- property file_is_cubin: boolView on GitHub
Whether
fileis a CUDA binary file.
- static list_elf(*, file: Path, arch: NVIDIAArch | None = None) Generator[str, None, None]View on GitHub
List ELF files in file.
- Parameters:
arch – Optionally filter for a given architecture.
- class reprospect.tools.binaries.CuppFiltView on GitHub
Bases:
DemanglerMixinConvenient wrapper for
cu++filt.- classmethod get_executable() strView on GitHub
- class reprospect.tools.binaries.ELF(*, file: Path)View on GitHub
Bases:
objectHelper for reading ELF files and retrieve CUDA-specific information.
- EF_CUDA_SM_OFFSET_POST_BLACKWELL: Final[int] = 8
Offset for compute capability field post BLACKWELL.
- EF_CUDA_SM_PRE_BLACKWELL: Final[int] = 255
Mask for compute capability field pre BLACKWELL.
References:
- __enter__() SelfView on GitHub
- __exit__(*args, **kwargs) NoneView on GitHub
- __init__(*, file: Path) NoneView on GitHub
- property arch: NVIDIAArchView on GitHub
Get compute capability encoded in header as NVIDIA architecture.
- classmethod compute_capability(value) ComputeCapabilityView on GitHub
Return compute capability encoded in e_flags.
- property header: ContainerView on GitHub
- property is_cuda: boolView on GitHub
- classmethod is_cuda_impl(*, header: Container) boolView on GitHub
- nvinfo(mangled: str) NvInfoView on GitHub
Extract and parse the .nv.info.<mangled> section.
- class reprospect.tools.binaries.Function(symbol: str, code: str, ru: ResourceUsage)View on GitHub
Bases:
objectData structure holding the SASS code and resource usage of a kernel, as extracted from a binary file.
- __init__(symbol: str, code: str, ru: ResourceUsage) None
- __str__() strView on GitHub
Rich representation with
to_table().
- ru: ResourceUsage
The resource usage.
- to_table(*, max_code_length: int = 130, descriptors: dict[str, str] | None = None) TableView on GitHub
Convert to a
rich.table.Table.- Parameters:
descriptors – Key-value pairs added as descriptor rows at the top of the table, optional.
- class reprospect.tools.binaries.LlvmCppFiltView on GitHub
Bases:
DemanglerMixinConvenient wrapper for
llvm-cxxfilt.- classmethod get_executable() strView on GitHub
- class reprospect.tools.binaries.NVDisasm(file: ~pathlib.Path, arch: ~reprospect.tools.architecture.NVIDIAArch | None = None, demangler: type[~reprospect.tools.binaries.demangle.CuppFilt | ~reprospect.tools.binaries.demangle.LlvmCppFilt] = <class 'reprospect.tools.binaries.demangle.CuppFilt'>)View on GitHub
Bases:
objectExtract information from CUDA binaries using
nvdisasm.The main purpose of
nvdisasmis to disassemble CUDA binary files. Beyond the raw disassembly, it can also annotate the disassembled SASS with information, such as register liveness range information.nvdisasmprovides liveness ranges for all register types:GPR,PRED,UGPR,UPRED; see alsoreprospect.tools.sass.decode.RegisterType.This class provides functionalities to parse this register liveness range information to deduce how many registers each kernel uses.
Note that the register use information extracted by
reprospect.tools.binaries.CuObjDumpconcerns only thereprospect.tools.sass.decode.RegisterType.GPRregister type. As compared withreprospect.tools.binaries.CuObjDump, this class provides details for all register types.Note that register liveness range information can also be obtained by parsing the SASS code extracted by
reprospect.tools.binaries.CuObjDump. However, to implement such a parser, it is not sufficient to simply track the registers that appear in the SASS code. For instance, for certain instructions, operands span multiple consecutive registers, but only the first register index appears in the instruction string. For instance,In
STG.E desc[UR6][R6.64], R15, the memory address operand[R6.64]uses two consecutive registers, namely,R6-R7, but onlyR6appears explicitly.In
LDCU.64 UR8, c[0x0][0x3d8], the modifier64indicates that the destination is the two consecutive registersUR8-UR9, but onlyUR8appears explicitly.In
IMAD.WIDE.U32 R2, R0, 0x4, R8, the modifierWIDEindicates thatR2andR8are twice as wide asR0and0x4. Hence, the destination and the addend useR2-R3andR8-R9, but onlyR2andR8appear explicitly.
There are also complexities such as tracking register usage across function calls. Consequently, to deduce the register usage, this class relies on parsing the register annotations provided by
nvdisasm, rather than on implementing its own logic to infer register usage from dumped SASS code.References:
- TABLE_BEGIN_END: Final[Pattern[str]] = re.compile('(?:\\.[A-Za-z0-9_]+:)?[ ]+\\/\\/ \\+[\\-\\+]+\\+$')
- __init__(file: ~pathlib.Path, arch: ~reprospect.tools.architecture.NVIDIAArch | None = None, demangler: type[~reprospect.tools.binaries.demangle.CuppFilt | ~reprospect.tools.binaries.demangle.LlvmCppFilt] = <class 'reprospect.tools.binaries.demangle.CuppFilt'>) NoneView on GitHub
- Parameters:
arch – Optionally check that file is a CUDA binary file for that arch.
- __str__() strView on GitHub
Rich representation.
- extract_register_usage_from_liveness_range_info(mangled: Iterable[str]) NoneView on GitHub
Extract register usage from liveness range information.
- classmethod parse_sass_with_liveness_range_info(function_mangled: str, sass: Iterator[str]) FunctionView on GitHub
Parse the SASS with the liveness range information to extract the resource usage.
It typically looks like:
// +--------------------+--------+----------------+ // | GPR | PRED | UGPR | // | # 0 1 2 3 4 5 6 7 | # 0 | # 0 1 2 3 4 5 | // +--------------------+--------+----------------+ // | | | | // | 1 ^ | | | // | 2 ^ : | | | // | 2 : : | | 1 ^ | // | 2 v : | | 1 : | // +--------------------+--------+----------------+
- class reprospect.tools.binaries.ResourceUsage(register: int = 0, constant: dict[int, int] = <factory>, shared: int = 0, local: int = 0, sampler: int = 0, stack: int = 0, surface: int = 0, texture: int = 0)View on GitHub
Bases:
objectResource usage.
References:
- __init__(register: int = 0, constant: dict[int, int] = <factory>, shared: int = 0, local: int = 0, sampler: int = 0, stack: int = 0, surface: int = 0, texture: int = 0) None
- __str__() strView on GitHub
- classmethod parse(line: str) ResourceUsageView on GitHub
Parse a resource usage line, such as produced by
cuobjdumpwith--dump-resource-usage.
Submodules
- reprospect.tools.binaries.cuobjdump module
- reprospect.tools.binaries.demangle module
- reprospect.tools.binaries.elf module
CuInfoELFNvInfoNvInfoEIATTRNvInfoEIATTR.CBANK_PARAM_SIZENvInfoEIATTR.CRS_STACK_SIZENvInfoEIATTR.CUDA_API_VERSIONNvInfoEIATTR.EXIT_INSTR_OFFSETSNvInfoEIATTR.KPARAM_INFONvInfoEIATTR.MAXREG_COUNTNvInfoEIATTR.MAX_THREADSNvInfoEIATTR.MERCURY_ISA_VERSIONNvInfoEIATTR.PARAM_CBANKNvInfoEIATTR.SPARSE_MMA_MASKNvInfoEIATTR.SW2861232_WARNvInfoEIATTR.SW_WARNvInfoEIATTR.VRC_CTA_INIT_COUNTNvInfoEIATTR.WARP_WIDE_INSTR_OFFSETS
NvInfoEIFMTNvInfoEntryTkInfo
- reprospect.tools.binaries.nvdisasm module
- reprospect.tools.binaries.symtab module