reprospect.tools.sass.decode module
- class reprospect.tools.sass.decode.ControlCode(stall_count: int, yield_flag: bool, read: int, write: int, wait: list[bool], reuse: dict[str, bool])View on GitHub
Bases:
objectSASS control code decoding.
SASS instructions use 128-bit encoding:
64-bit instruction word
64-bit word including control code
The control code manages instruction dependencies, scheduling, and resource usage.
The control code is a section of 21 bits with the following structure:
number of bits
meaning
4
stall count (0-15 cycles)
1
yield flag
3
write barrier
3
read barrier
6
wait barrier mask (one bit per barrier 0-5)
4
reuse flags mask (for operands A, B, C, D)
Barriers
Coordinate dependencies between instructions executing on different functional units:
Write barrier is set when the instruction completes and produces a result. It marks that the register is ready. Later instructions reading that register must wait.
Read barrier indicates that the instruction needs a register with the result from a previous instruction to remain unchanged. Later instructions writing on that register must wait.
The wait barrier mask is a bit mask with the barriers that the instruction must wait for (e.g. write and read barriers set by preceding instructions).
A write or read barrier value with the three bits set to 1, i.e., an integer value of 7, means that no such barrier is set.
Note that barriers are related to [scoreboarding](https://en.wikipedia.org/wiki/Scoreboarding).
The reader may refer to the following references for further description of barriers and scoreboard dependencies:
Stall count
Explicitly insert delay cycles to account for latency.
It can be set within the 0-15 cycles range.
Reuse flags
Operand forwarding optimization - reuse data from previous instruction. There are 4 reuse flags (A, B, C and D).
If a subsequent instruction uses the same register in the same operand slot, for instance operand slot B, the compiler may set reuse_B to avoid register traffic.
Yield flag
Hint to scheduler to yield to other warps. It is mainly used for long-latency operations and allows better multi-warp interleaving, thus improving throughput.
Notes
There are 6 barriers (6 scoreboard barriers for dependency tracking).
Each warp has its own set of barriers.
The GPU hardware automatically enforces barrier dependencies.
The control code is generated by the CUDA
ptxastool.
References:
[max]
- __init__(stall_count: int, yield_flag: bool, read: int, write: int, wait: list[bool], reuse: dict[str, bool]) None
- static decode(*, code: str) ControlCodeView on GitHub
Decode 64-bit word including a control code.
- class reprospect.tools.sass.decode.Decoder(*, source: Path | None = None, code: str | None = None, skip_until_headerflags: bool = True)View on GitHub
Bases:
TableMixinParse the SASS assembly code extracted from a binary.
The disassembled
instructionsare collected, and the associated control codes are decoded.- HEXADECIMAL: Final[str] = '0x[a-f0-9]+'
Matcher for an hex-like string, such as 0x00000a0000017a02.
- MATCHER: Final[Pattern[str]] = re.compile('/\\*([a-f0-9]+)\\*/\\s+(.*?)(?=\\s{2,}|[;?&]).*?/\\*\\s*(0x[a-f0-9]+)\\s*\\*/')
Matcher for the full SASS line. It focuses on the offset, instruction and trailing hex encoding.
For instance, it matches:
/*0070*/ UIMAD UR4, UR4, UR5, URZ ; /* 0x00000005040472a4 */
Note that sometimes, additional strings appear after the instruction and before the hex:
/*0090*/ ISETP.GE.U32.AND P0, PT, R0, UR9, PT ?WAIT13_END_GROUP; /* 0x0000000900007c0c */ /*00e0*/ LDC.64 R10, c[0x0][0x3a0] &wr=0x2 ?trans8; /* 0x0000e800ff0a7b82 */
These strings are ignored.
- __init__(*, source: Path | None = None, code: str | None = None, skip_until_headerflags: bool = True) NoneView on GitHub
Initialize the decoder with the SASS contained in source or code.
- instructions: Final[list[Instruction]]
The parsed instructions.
- to_df() DataFrameView on GitHub
Convert the decoded SASS to a
pandas.DataFrame.
- to_html() strView on GitHub
Visualize the decoded SASS in a nice HTML table.
- to_table() TableView on GitHub
Convert the decoded SASS to a
rich.table.Table.
- class reprospect.tools.sass.decode.Instruction(offset: int, instruction: str, hex: str, control: ControlCode)View on GitHub
Bases:
objectRepresents a single SASS instruction with its components.
- control: ControlCode
The decoded control code associated with the instruction.
- class reprospect.tools.sass.decode.RegisterType(*values)View on GitHub
Bases:
StrEnumRegister types:
GPR: General Purpose Registers.PRED: Predicate Registers.UGPR: Uniform General Purpose Registers.UPRED: Uniform Predicate Registers.
- GPR = 'R'
- PRED = 'P'
- UGPR = 'UR'
- UPRED = 'UP'
- __str__()
Return str(self).
- property is_predicate: boolView on GitHub
- property special: strView on GitHub