reprospect.tools.binaries.nvdisasm module

class reprospect.tools.binaries.nvdisasm.Function(registers: dict[RegisterType, tuple[int, int]] | None = None)View on GitHub

Bases: TableMixin

Data structure holding resource usage information of a kernel, as extracted from a binary.

registers holds detailed register usage information per register type. Each entry is a tuple holding:

  • the length of the span of used registers, i.e., the maximum register index + 1

  • the number of registers actually used within that span

For instance, if a kernel uses registers R0, R1, and R3, then the entry for reprospect.tools.sass.decode.RegisterType.GPR will be (4, 3) because the span R0-R3 contains 4 registers, from which 3 are actually used.

Note

It is not decorated with dataclasses.dataclass() because of https://github.com/mypyc/mypyc/issues/1061.

__init__(registers: dict[RegisterType, tuple[int, int]] | None = None) NoneView on GitHub
registers: dict[RegisterType, tuple[int, int]] | None
to_table() TableView on GitHub

Convert the register usage to a rich.table.Table.

class reprospect.tools.binaries.nvdisasm.NVDisasm(file: ~pathlib.Path, arch: ~reprospect.tools.architecture.NVIDIAArch | None = None, demangler: type[~reprospect.tools.binaries.demangle.CuppFilt | ~reprospect.tools.binaries.demangle.LlvmCppFilt] = <class 'reprospect.tools.binaries.demangle.CuppFilt'>)View on GitHub

Bases: object

Extract information from CUDA binaries using nvdisasm.

The main purpose of nvdisasm is to disassemble CUDA binary files. Beyond the raw disassembly, it can also annotate the disassembled SASS with information, such as register liveness range information. nvdisasm provides liveness ranges for all register types: GPR, PRED, UGPR, UPRED; see also reprospect.tools.sass.decode.RegisterType.

This class provides functionalities to parse this register liveness range information to deduce how many registers each kernel uses.

Note that the register use information extracted by reprospect.tools.binaries.CuObjDump concerns only the reprospect.tools.sass.decode.RegisterType.GPR register type. As compared with reprospect.tools.binaries.CuObjDump, this class provides details for all register types.

Note that register liveness range information can also be obtained by parsing the SASS code extracted by reprospect.tools.binaries.CuObjDump. However, to implement such a parser, it is not sufficient to simply track the registers that appear in the SASS code. For instance, for certain instructions, operands span multiple consecutive registers, but only the first register index appears in the instruction string. For instance,

  • In STG.E desc[UR6][R6.64], R15, the memory address operand [R6.64] uses two consecutive registers, namely, R6-R7, but only R6 appears explicitly.

  • In LDCU.64 UR8, c[0x0][0x3d8], the modifier 64 indicates that the destination is the two consecutive registers UR8-UR9, but only UR8 appears explicitly.

  • In IMAD.WIDE.U32 R2, R0, 0x4, R8, the modifier WIDE indicates that R2 and R8 are twice as wide as R0 and 0x4. Hence, the destination and the addend use R2-R3 and R8-R9, but only R2 and R8 appear explicitly.

There are also complexities such as tracking register usage across function calls. Consequently, to deduce the register usage, this class relies on parsing the register annotations provided by nvdisasm, rather than on implementing its own logic to infer register usage from dumped SASS code.

References:

HEADER_COLS: Final[Pattern[str]] = re.compile('^[ ]+\\/\\/ \\|(?:([A-Z ]+\\|)+),?$')
HEADER_SEP: Final[Pattern[str]] = re.compile('^[ ]+\\/\\/ \\+[\\-\\+]+\\+$')
REGISTER_INDEX: Final[Pattern[str]] = re.compile('^\\/\\/ \\| (?:[0-9]+)?')
SECTION_SKIP: Final[Pattern[str]] = re.compile('^\\t\\.(section|sectionflags|sectioninfo|align)')
TABLE_BEGIN_END: Final[Pattern[str]] = re.compile('(?:\\.[A-Za-z0-9_]+:)?[ ]+\\/\\/ \\+[\\-\\+]+\\+$')
TABLE_CUT: Final[Pattern[str]] = re.compile('(?:\\.[A-Za-z0-9_]+:)?[ ]+\\/\\/ \\+[\\.]+')
__init__(file: ~pathlib.Path, arch: ~reprospect.tools.architecture.NVIDIAArch | None = None, demangler: type[~reprospect.tools.binaries.demangle.CuppFilt | ~reprospect.tools.binaries.demangle.LlvmCppFilt] = <class 'reprospect.tools.binaries.demangle.CuppFilt'>) NoneView on GitHub
Parameters:

arch – Optionally check that file is a CUDA binary file for that arch.

__str__() strView on GitHub

Rich representation.

extract_register_usage_from_liveness_range_info(mangled: Iterable[str]) NoneView on GitHub

Extract register usage from liveness range information.

functions: dict[str, Function]
classmethod parse_sass_with_liveness_range_info(function_mangled: str, sass: Iterator[str]) FunctionView on GitHub

Parse the SASS with the liveness range information to extract the resource usage.

It typically looks like:

// +--------------------+--------+----------------+
// |    GPR             | PRED   |   UGPR         |
// | # 0 1 2 3 4 5 6 7  |  # 0   | # 0 1 2 3 4 5  |
// +--------------------+--------+----------------+
// |                    |        |                |
// | 1   ^              |        |                |
// | 2 ^ :              |        |                |
// | 2 : :              |        | 1 ^            |
// | 2 v :              |        | 1 :            |
// +--------------------+--------+----------------+
class reprospect.tools.binaries.nvdisasm.RegisterState(*values)View on GitHub

Bases: StrEnum

Register state, typically found in the output of nvdisasm.

References:

ASSIGNMENT = '^'
IN_USE = ':'
NOT_IN_USE = ' '
USAGE = 'v'
USAGE_AND_REASSIGNMENT = 'x'
__str__()

Return str(self).

property used: boolView on GitHub

Whether the state corresponds to a state in which the register is in use.