reprospect.tools.sass.decode module

class reprospect.tools.sass.decode.ControlCode(stall_count: int, yield_flag: bool, read: int, write: int, wait: list[bool], reuse: dict[str, bool])View on GitHub

Bases: object

SASS control code decoding.

SASS instructions use 128-bit encoding:

  • 64-bit instruction word

  • 64-bit word including control code

The control code manages instruction dependencies, scheduling, and resource usage.

The control code is a section of 21 bits with the following structure:

number of bits

meaning

4

stall count (0-15 cycles)

1

yield flag

3

write barrier

3

read barrier

6

wait barrier mask (one bit per barrier 0-5)

4

reuse flags mask (for operands A, B, C, D)

Barriers

Coordinate dependencies between instructions executing on different functional units:

  • Write barrier is set when the instruction completes and produces a result. It marks that the register is ready. Later instructions reading that register must wait.

  • Read barrier indicates that the instruction needs a register with the result from a previous instruction to remain unchanged. Later instructions writing on that register must wait.

  • The wait barrier mask is a bit mask with the barriers that the instruction must wait for (e.g. write and read barriers set by preceding instructions).

A write or read barrier value with the three bits set to 1, i.e., an integer value of 7, means that no such barrier is set.

Note that barriers are related to [scoreboarding](https://en.wikipedia.org/wiki/Scoreboarding).

The reader may refer to the following references for further description of barriers and scoreboard dependencies:

Stall count

Explicitly insert delay cycles to account for latency.

It can be set within the 0-15 cycles range.

Reuse flags

Operand forwarding optimization - reuse data from previous instruction. There are 4 reuse flags (A, B, C and D).

If a subsequent instruction uses the same register in the same operand slot, for instance operand slot B, the compiler may set reuse_B to avoid register traffic.

Yield flag

Hint to scheduler to yield to other warps. It is mainly used for long-latency operations and allows better multi-warp interleaving, thus improving throughput.

Notes

  • There are 6 barriers (6 scoreboard barriers for dependency tracking).

  • Each warp has its own set of barriers.

  • The GPU hardware automatically enforces barrier dependencies.

  • The control code is generated by the CUDA ptxas tool.

References:

__init__(stall_count: int, yield_flag: bool, read: int, write: int, wait: list[bool], reuse: dict[str, bool]) None
static decode(*, code: str) ControlCodeView on GitHub

Decode 64-bit word including a control code.

read: int

Index of the read barrier that is set (7 if no read barrier is set).

reuse: dict[str, bool]

Reuse flags.

stall_count: int

Stall count.

wait: list[bool]

Wait barrier mask (one bit per barrier 0-5).

write: int

Index of the write barrier that is set (7 if no write barrier is set).

yield_flag: bool

Yield flag.

class reprospect.tools.sass.decode.Decoder(*, source: Path | None = None, code: str | None = None, skip_until_headerflags: bool = True)View on GitHub

Bases: TableMixin

Parse the SASS assembly code extracted from a binary.

The disassembled instructions are collected, and the associated control codes are decoded.

HEXADECIMAL: Final[str] = '0x[a-f0-9]+'

Matcher for an hex-like string, such as 0x00000a0000017a02.

MATCHER: Final[Pattern[str]] = re.compile('/\\*([a-f0-9]+)\\*/\\s+(.*?)(?=\\s{2,}|[;?&]).*?/\\*\\s*(0x[a-f0-9]+)\\s*\\*/')

Matcher for the full SASS line. It focuses on the offset, instruction and trailing hex encoding.

For instance, it matches:

/*0070*/                   UIMAD UR4, UR4, UR5, URZ ;                       /* 0x00000005040472a4 */

Note that sometimes, additional strings appear after the instruction and before the hex:

/*0090*/ ISETP.GE.U32.AND P0, PT, R0, UR9, PT            ?WAIT13_END_GROUP;  /* 0x0000000900007c0c */
/*00e0*/ LDC.64 R10, c[0x0][0x3a0]  &wr=0x2  ?trans8;    /* 0x0000e800ff0a7b82 */

These strings are ignored.

MATCHER_CONTROL: Final[Pattern[str]] = re.compile('\\/\\* (0x[a-f0-9]+) \\*\\/')
OFFSET: Final[str] = '[a-f0-9]+'
__init__(*, source: Path | None = None, code: str | None = None, skip_until_headerflags: bool = True) NoneView on GitHub

Initialize the decoder with the SASS contained in source or code.

code: Final[str | None]
instructions: Final[list[Instruction]]

The parsed instructions.

source: Final[Path | None]
to_df() DataFrameView on GitHub

Convert the decoded SASS to a pandas.DataFrame.

to_html() strView on GitHub

Visualize the decoded SASS in a nice HTML table.

to_table() TableView on GitHub

Convert the decoded SASS to a rich.table.Table.

class reprospect.tools.sass.decode.Instruction(offset: int, instruction: str, hex: str, control: ControlCode)View on GitHub

Bases: object

Represents a single SASS instruction with its components.

__init__(offset: int, instruction: str, hex: str, control: ControlCode) None
control: ControlCode

The decoded control code associated with the instruction.

hex: str

The hexadecimal representation of the instruction.

instruction: str

The disassembled SASS instruction including opcode, modifiers and operands.

offset: int

Offset of the instruction in the SASS code.

class reprospect.tools.sass.decode.RegisterType(*values)View on GitHub

Bases: StrEnum

Register types:

  • GPR: General Purpose Registers.

  • PRED: Predicate Registers.

  • UGPR: Uniform General Purpose Registers.

  • UPRED: Uniform Predicate Registers.

GPR = 'R'
PRED = 'P'
UGPR = 'UR'
UPRED = 'UP'
__str__()

Return str(self).

property is_predicate: boolView on GitHub
property special: strView on GitHub