reprospect.tools.binaries package

class reprospect.tools.binaries.CuObjDump(file: ~pathlib.Path, arch: ~reprospect.tools.architecture.NVIDIAArch, *, sass: bool = True, demangler: type[~reprospect.tools.binaries.demangle.CuppFilt | ~reprospect.tools.binaries.demangle.LlvmCppFilt] | None = <class 'reprospect.tools.binaries.demangle.CuppFilt'>, keep: ~typing.Iterable[str] | None = None)View on GitHub

Bases: object

Use cuobjdump for extracting SASS, symbol table, and so on.

References:

__init__(file: ~pathlib.Path, arch: ~reprospect.tools.architecture.NVIDIAArch, *, sass: bool = True, demangler: type[~reprospect.tools.binaries.demangle.CuppFilt | ~reprospect.tools.binaries.demangle.LlvmCppFilt] | None = <class 'reprospect.tools.binaries.demangle.CuppFilt'>, keep: ~typing.Iterable[str] | None = None) NoneView on GitHub
Parameters:
  • file – Either a host binary file containing one or more embedded CUDA binary files, or itself a CUDA binary file.

  • keep – Optionally filter the functions to be kept.

__str__() strView on GitHub

Rich representation.

arch: Final[NVIDIAArch]

The NVIDIA architecture.

property embedded_cubins: tuple[str, ...]View on GitHub

Get the names of the embedded CUDA binary files contained in file.

classmethod extract(*, file: Path, arch: NVIDIAArch, cwd: Path, cubin: str, **kwargs) tuple[CuObjDump, Path]View on GitHub

Extract the embedded CUDA binary file whose name contains cubin, from file, for the given arch.

The file can be inspected with the following command to list all ELF files:

cuobjdump --list-elf <file>

Note that extracting an embedded CUDA binary from a file so as to extract a specific subset of the SASS can be significantly faster than extracting all the SASS straightforwardly from the whole file.

static extract_elf(*, file: Path, cwd: Path | None = None, arch: NVIDIAArch | None = None, name: str | None = None) Generator[str, None, None]View on GitHub

Extract ELF files from file.

Parameters:
  • arch – Optionally filter for a given architecture.

  • name – Optionally filter by name.

file: Final[Path]

The binary file.

property file_is_cubin: boolView on GitHub

Whether file is a CUDA binary file.

static list_elf(*, file: Path, arch: NVIDIAArch | None = None) Generator[str, None, None]View on GitHub

List ELF files in file.

Parameters:

arch – Optionally filter for a given architecture.

parse_sass(demangler: type[CuppFilt | LlvmCppFilt] | None = None, keep: Iterable[str] | None = None) NoneView on GitHub

Parse SASS from file.

property symtab: DataFrameView on GitHub

Extract the symbol table from file for arch.

This function requires that file is either a host binary file containing only a single embedded CUDA binary file or itself a CUDA binary file.

class reprospect.tools.binaries.CuppFiltView on GitHub

Bases: DemanglerMixin

Convenient wrapper for cu++filt.

classmethod get_executable() strView on GitHub
class reprospect.tools.binaries.ELF(*, file: Path)View on GitHub

Bases: object

Helper for reading ELF files and retrieve CUDA-specific information.

EF_CUDA_SM_OFFSET_POST_BLACKWELL: Final[int] = 8

Offset for compute capability field post BLACKWELL.

EF_CUDA_SM_POST_BLACKWELL: Final[int] = 65280

Mask for compute capability field post BLACKWELL.

EF_CUDA_SM_PRE_BLACKWELL: Final[int] = 255

Mask for compute capability field pre BLACKWELL.

References:

ELFABIVERSION_CUDA: Final[tuple[int, int]] = (7, 8)

References:

ELFOSABI_CUDA: Final[tuple[int, ...]] = (41, 51, 65)

References:

__enter__() SelfView on GitHub
__exit__(*args, **kwargs) NoneView on GitHub
__init__(*, file: Path) NoneView on GitHub
property arch: NVIDIAArchView on GitHub

Get compute capability encoded in header as NVIDIA architecture.

classmethod compute_capability(value) ComputeCapabilityView on GitHub

Return compute capability encoded in e_flags.

file: Path

Path to the ELF file.

property header: ContainerView on GitHub
property is_cuda: boolView on GitHub

Return True if file is a valid CUDA binary file.

classmethod is_cuda_impl(*, header: Container) boolView on GitHub
nvinfo(mangled: str) NvInfoView on GitHub

Extract and parse the .nv.info.<mangled> section.

class reprospect.tools.binaries.Function(symbol: str, code: str, ru: ResourceUsage)View on GitHub

Bases: object

Data structure holding the SASS code and resource usage of a kernel, as extracted from a binary file.

__init__(symbol: str, code: str, ru: ResourceUsage) None
__str__() strView on GitHub

Rich representation with to_table().

code: str

The SASS code.

ru: ResourceUsage

The resource usage.

symbol: str

The symbol name.

to_table(*, max_code_length: int = 130, descriptors: dict[str, str] | None = None) TableView on GitHub

Convert to a rich.table.Table.

Parameters:

descriptors – Key-value pairs added as descriptor rows at the top of the table, optional.

class reprospect.tools.binaries.LlvmCppFiltView on GitHub

Bases: DemanglerMixin

Convenient wrapper for llvm-cxxfilt.

classmethod get_executable() strView on GitHub
class reprospect.tools.binaries.NVDisasm(file: ~pathlib.Path, arch: ~reprospect.tools.architecture.NVIDIAArch | None = None, demangler: type[~reprospect.tools.binaries.demangle.CuppFilt | ~reprospect.tools.binaries.demangle.LlvmCppFilt] = <class 'reprospect.tools.binaries.demangle.CuppFilt'>)View on GitHub

Bases: object

Extract information from CUDA binaries using nvdisasm.

The main purpose of nvdisasm is to disassemble CUDA binary files. Beyond the raw disassembly, it can also annotate the disassembled SASS with information, such as register liveness range information. nvdisasm provides liveness ranges for all register types: GPR, PRED, UGPR, UPRED; see also reprospect.tools.sass.decode.RegisterType.

This class provides functionalities to parse this register liveness range information to deduce how many registers each kernel uses.

Note that the register use information extracted by reprospect.tools.binaries.CuObjDump concerns only the reprospect.tools.sass.decode.RegisterType.GPR register type. As compared with reprospect.tools.binaries.CuObjDump, this class provides details for all register types.

Note that register liveness range information can also be obtained by parsing the SASS code extracted by reprospect.tools.binaries.CuObjDump. However, to implement such a parser, it is not sufficient to simply track the registers that appear in the SASS code. For instance, for certain instructions, operands span multiple consecutive registers, but only the first register index appears in the instruction string. For instance,

  • In STG.E desc[UR6][R6.64], R15, the memory address operand [R6.64] uses two consecutive registers, namely, R6-R7, but only R6 appears explicitly.

  • In LDCU.64 UR8, c[0x0][0x3d8], the modifier 64 indicates that the destination is the two consecutive registers UR8-UR9, but only UR8 appears explicitly.

  • In IMAD.WIDE.U32 R2, R0, 0x4, R8, the modifier WIDE indicates that R2 and R8 are twice as wide as R0 and 0x4. Hence, the destination and the addend use R2-R3 and R8-R9, but only R2 and R8 appear explicitly.

There are also complexities such as tracking register usage across function calls. Consequently, to deduce the register usage, this class relies on parsing the register annotations provided by nvdisasm, rather than on implementing its own logic to infer register usage from dumped SASS code.

References:

HEADER_COLS: Final[Pattern[str]] = re.compile('^[ ]+\\/\\/ \\|(?:([A-Z ]+\\|)+),?$')
HEADER_SEP: Final[Pattern[str]] = re.compile('^[ ]+\\/\\/ \\+[\\-\\+]+\\+$')
REGISTER_INDEX: Final[Pattern[str]] = re.compile('^\\/\\/ \\| (?:[0-9]+)?')
SECTION_SKIP: Final[Pattern[str]] = re.compile('^\\t\\.(section|sectionflags|sectioninfo|align)')
TABLE_BEGIN_END: Final[Pattern[str]] = re.compile('(?:\\.[A-Za-z0-9_]+:)?[ ]+\\/\\/ \\+[\\-\\+]+\\+$')
TABLE_CUT: Final[Pattern[str]] = re.compile('(?:\\.[A-Za-z0-9_]+:)?[ ]+\\/\\/ \\+[\\.]+')
__init__(file: ~pathlib.Path, arch: ~reprospect.tools.architecture.NVIDIAArch | None = None, demangler: type[~reprospect.tools.binaries.demangle.CuppFilt | ~reprospect.tools.binaries.demangle.LlvmCppFilt] = <class 'reprospect.tools.binaries.demangle.CuppFilt'>) NoneView on GitHub
Parameters:

arch – Optionally check that file is a CUDA binary file for that arch.

__str__() strView on GitHub

Rich representation.

extract_register_usage_from_liveness_range_info(mangled: Iterable[str]) NoneView on GitHub

Extract register usage from liveness range information.

classmethod parse_sass_with_liveness_range_info(function_mangled: str, sass: Iterator[str]) FunctionView on GitHub

Parse the SASS with the liveness range information to extract the resource usage.

It typically looks like:

// +--------------------+--------+----------------+
// |    GPR             | PRED   |   UGPR         |
// | # 0 1 2 3 4 5 6 7  |  # 0   | # 0 1 2 3 4 5  |
// +--------------------+--------+----------------+
// |                    |        |                |
// | 1   ^              |        |                |
// | 2 ^ :              |        |                |
// | 2 : :              |        | 1 ^            |
// | 2 v :              |        | 1 :            |
// +--------------------+--------+----------------+
class reprospect.tools.binaries.ResourceUsage(register: int = 0, constant: dict[int, int] = <factory>, shared: int = 0, local: int = 0, sampler: int = 0, stack: int = 0, surface: int = 0, texture: int = 0)View on GitHub

Bases: object

Resource usage.

References:

__init__(register: int = 0, constant: dict[int, int] = <factory>, shared: int = 0, local: int = 0, sampler: int = 0, stack: int = 0, surface: int = 0, texture: int = 0) None
__str__() strView on GitHub
constant: dict[int, int]
local: int
classmethod parse(line: str) ResourceUsageView on GitHub

Parse a resource usage line, such as produced by cuobjdump with --dump-resource-usage.

register: int
sampler: int
shared: int
stack: int
surface: int
texture: int

Submodules