reprospect.tools.ncu package

class reprospect.tools.ncu.Cacher(*, directory: str | Path | None = None)View on GitHub 

Bases: Cacher

Cacher tailored to ncu results.

ncu require quite some time to acquire results, especially when there are many kernels to profile and/or many metrics to collect.

On a cache hit, the cacher will serve:

<cache_key>.ncu-rep file
.log file

On a cache miss, ncu is launched and the cache entry populated accordingly.

Note

It is assumed that hashing is faster than running ncu itself.

Warning

The cache should not be shared between machines, since there may be differences between machines that influence the results but are not included in the hashing.

TABLE: ClassVar[str] = 'ncu': Name of the table.

__init__(*, directory: str | Path | None = None)View on GitHub 

hash(**kwargs) → blake3View on GitHub 

hash_impl(*, command: Command) → blake3View on GitHub 

Hash based on:

ncu version
ncu options (but not the output and log files)
executable content
executable arguments
linked libraries
environment

populate(directory: Path, **kwargs) → NoneView on GitHub : When there is a cache miss, call reprospect.tools.ncu.Session.run(). Fill the directory with the artifacts.

run(command: Command, **kwargs) → EntryView on GitHub : On a cache hit, copy files from the cache entry.

Bases: object

Run a ncu command line.

__init__(*, executable: str | Path, output: Path, opts: tuple[str, ...] = (), metrics: tuple[Metric | MetricCorrelation | MetricDeviceAttribute, ...] | None = None, nvtx_includes: tuple[str, ...] | None = None, args: tuple[str | Path, ...] | None = None, env: Mapping[str, str] | None = None) → None: Method generated by attrs for class Command.

args: tuple[str | Path, ...] | None: Arguments to pass to the executable.

cmd: tuple[str | Path, ...]

env: Mapping[str, str] | None: Mapping used to update the environment before running, see run().

executable: str | Path: Executable to run.

log: Path: Log file.

metrics: tuple[Metric | MetricCorrelation | MetricDeviceAttribute, ...] | None: Metrics.

nvtx_includes: tuple[str, ...] | None: NVTX include. Refer to https://docs.nvidia.com/nsight-compute/2023.3/NsightComputeCli/index.html#nvtx-filtering.

opts: tuple[str, ...]: Options that do not involve paths.

output: Path: Report file.

run(*, cwd: Path | None = None, env: MutableMapping[str, str] | None = None) → intView on GitHub 

class reprospect.tools.ncu.L1TEXCacheView on GitHub 

Bases: object

A selection of metrics related to L1/TEX cache.

See [Str21].

GlobalLoadView on GitHub : alias of L1TEXCacheGlobalLoad

GlobalStoreView on GitHub : alias of L1TEXCacheGlobalStore

LocalStoreView on GitHub : alias of L1TEXCacheLocalStore

NAME: Final[str] = 'L1/TEX cache'

class reprospect.tools.ncu.L1TEXCacheGlobalLoadView on GitHub 

Bases: object

InstructionsView on GitHub : alias of L1TEXCacheGlobalLoadInstructions

NAME: Final[str] = 'global load'

RequestsView on GitHub : alias of L1TEXCacheGlobalLoadRequests

SectorHitsView on GitHub : alias of L1TEXCacheGlobalLoadSectorHits

SectorMissesView on GitHub : alias of L1TEXCacheGlobalLoadSectorMisses

SectorsView on GitHub : alias of L1TEXCacheGlobalLoadSectors

WavefrontsView on GitHub : alias of L1TEXCacheGlobalLoadWavefronts

class reprospect.tools.ncu.L1TEXCacheGlobalLoadInstructionsView on GitHub 

Bases: object

Factory of counter metric (unit)__(sass?)_inst_executed_op_global_ld.

static create(*, unit: Unit = Unit.SMSP, mode: Literal['sass'] | None = 'sass', subs: tuple[MetricCounterRollUp, ...] = (MetricCounterRollUp.SUM,)) → MetricCounterView on GitHub 

class reprospect.tools.ncu.L1TEXCacheGlobalLoadRequestsView on GitHub 

Bases: object

Factory of counter metric l1tex__t_requests_pipe_lsu_mem_global_op_ld.

static create(*, subs: tuple[MetricCounterRollUp, ...] = (MetricCounterRollUp.SUM,)) → MetricCounterView on GitHub 

class reprospect.tools.ncu.L1TEXCacheGlobalLoadSectorHitsView on GitHub 

Bases: object

Factory of counter metric l1tex__t_sectors_pipe_lsu_mem_global_op_ld_lookup_hit.

static create(*, subs: tuple[MetricCounterRollUp, ...] = (MetricCounterRollUp.SUM,)) → MetricCounterView on GitHub 

class reprospect.tools.ncu.L1TEXCacheGlobalLoadSectorMissesView on GitHub 

Bases: object

Factory of counter metric l1tex__t_sectors_pipe_lsu_mem_global_op_ld_lookup_miss.

static create(*, subs: tuple[MetricCounterRollUp, ...] = (MetricCounterRollUp.SUM,)) → MetricCounterView on GitHub 

class reprospect.tools.ncu.L1TEXCacheGlobalLoadSectorsView on GitHub 

Bases: object

Factory of counter metric l1tex__t_sectors_pipe_lsu_mem_global_op_ld.

static create(*, subs: tuple[MetricCounterRollUp, ...] = (MetricCounterRollUp.SUM,), suffix: Literal['hit', 'miss'] | None = None) → MetricCounterView on GitHub 

class reprospect.tools.ncu.L1TEXCacheGlobalLoadWavefrontsView on GitHub 

Bases: object

Factory of counter metric l1tex__t_wavefronts_pipe_lsu_mem_global_op_ld.

static create(*, subs: tuple[MetricCounterRollUp, ...] = (MetricCounterRollUp.SUM,)) → MetricCounterView on GitHub 

class reprospect.tools.ncu.L1TEXCacheGlobalStoreView on GitHub 

Bases: object

InstructionsView on GitHub : alias of L1TEXCacheGlobalStoreInstructions

NAME: Final[str] = 'global store'

SectorsView on GitHub : alias of L1TEXCacheGlobalStoreSectors

class reprospect.tools.ncu.L1TEXCacheGlobalStoreInstructionsView on GitHub 

Bases: object

Factory of counter metric (unit)__(sass?)_inst_executed_op_global_st.

static create(*, unit: Unit = Unit.SMSP, mode: Literal['sass'] | None = 'sass', subs: tuple[MetricCounterRollUp, ...] = (MetricCounterRollUp.SUM,)) → MetricCounterView on GitHub 

class reprospect.tools.ncu.L1TEXCacheGlobalStoreSectorsView on GitHub 

Bases: object

Factory of counter metric l1tex__t_sectors_pipe_lsu_mem_global_op_st.

static create(*, subs: tuple[MetricCounterRollUp, ...] = (MetricCounterRollUp.SUM,)) → MetricCounterView on GitHub 

class reprospect.tools.ncu.L1TEXCacheLocalStoreView on GitHub 

Bases: object

InstructionsView on GitHub : alias of L1TEXCacheLocalStoreInstructions

NAME: Final[str] = 'local store'

class reprospect.tools.ncu.L1TEXCacheLocalStoreInstructionsView on GitHub 

Bases: object

Factory of counter metric (unit)__(sass?)_inst_executed_op_local_st.

static create(*, unit: Unit = Unit.SMSP, mode: Literal['sass'] | None = 'sass', subs: tuple[MetricCounterRollUp, ...] = (MetricCounterRollUp.SUM,)) → MetricCounterView on GitHub 

class reprospect.tools.ncu.LaunchBlockView on GitHub 

Bases: XYZBase

Factory of metrics launch__block_dim_x, launch__block_dim_y and launch__block_dim_z.

prefix: ClassVar[str] = 'launch__block_dim_'

class reprospect.tools.ncu.LaunchGridView on GitHub 

Bases: XYZBase

Factory of metrics launch__grid_dim_x, launch__grid_dim_y and launch__grid_dim_z.

prefix: ClassVar[str] = 'launch__grid_dim_'

class reprospect.tools.ncu.Metric(name: str, pretty_name: str | None = None, subs: tuple[str, ...] | None = None)View on GitHub 

Bases: object

Used to represent a ncu metric.

If subs is not given, it is assumed that name is a valid metric that can be directly evaluated by ncu.

References:

https://docs.nvidia.com/nsight-compute/ProfilingGuide/index.html#metrics-structure

__init__(name: str, pretty_name: str | None = None, subs: tuple[str, ...] | None = None) → None

gather() → tuple[str, ...]View on GitHub : Get the list of sub-metric names or the metric name itself if no sub-metrics are defined.

name: str: The base name of the metric.

pretty_name: str | None: Human readable name.

subs: tuple[str, ...] | None: Optional sub-metric names.

class reprospect.tools.ncu.MetricCorrelation(name: str)View on GitHub 

Bases: object

A metric with correlations, like sass__inst_executed_per_opcode.

References:

https://docs.nvidia.com/nsight-compute/ProfilingGuide/index.html#metrics-structure

__init__(name: str) → None

gather() → tuple[str]View on GitHub 

name: str

class reprospect.tools.ncu.MetricCounter(name: str, pretty_name: str | None = None, subs: tuple[str, ...] | None = None)View on GitHub 

Bases: Metric

A counter metric.

The sub-metric names are expected to be from MetricCounterRollUp.

References:

https://docs.nvidia.com/nsight-compute/ProfilingGuide/index.html#metrics-structure

__init__(name: str, pretty_name: str | None = None, subs: tuple[str, ...] | None = None) → None

class reprospect.tools.ncu.MetricCounterRollUp(*values)View on GitHub 

Bases: StrEnum

Available roll-ups for MetricCounter.

AVG = 'avg'

MAX = 'max'

MIN = 'min'

SUM = 'sum'

__str__(): Return str(self).

class reprospect.tools.ncu.MetricDeviceAttribute(name: str)View on GitHub 

Bases: object

ncu device attribute metric, such as:

device__attribute_architecture

Note

Available device attribute metrics can be queryied with:

ncu --query-metrics-collection=device

__init__(name: str) → None

property full_name: strView on GitHub 

gather() → tuple[str]View on GitHub 

name: str

class reprospect.tools.ncu.MetricRatio(name: str, pretty_name: str | None = None, subs: tuple[str, ...] | None = None)View on GitHub 

Bases: Metric

A ratio metric.

The sub-metric names are expected to be from MetricRatioRollUp.

References:

https://docs.nvidia.com/nsight-compute/ProfilingGuide/index.html#metrics-structure

__init__(name: str, pretty_name: str | None = None, subs: tuple[str, ...] | None = None) → None

class reprospect.tools.ncu.MetricRatioRollUp(*values)View on GitHub 

Bases: StrEnum

Available roll-ups for MetricRatio.

MAX_RATE = 'max_rate'

PCT = 'pct'

RATIO = 'ratio'

__str__(): Return str(self).

Mapping of profiling metric keys to their values.

Note

It is not decorated with dataclasses.dataclass() because of https://github.com/mypyc/mypyc/issues/1061.

__init__(data: dict[str, int | float | dict[str, int | float] | MetricCorrelationData | str]) → NoneView on GitHub 

data: Final[dict[str, int | float | dict[str, int | float] | MetricCorrelationData | str]]

class reprospect.tools.ncu.ProfilingResults(data: dict[str, ProfilingResults | ProfilingMetrics] | None = None)View on GitHub 

Bases: TreeMixin

Nested tree data structure for storing profiling results.

The data structure consists of internal nodes and leaf nodes:

The internal nodes are themselves ProfilingResults instances. They organise results hierarchically by the profiling range (e.g. NVTX range) that they were obtained from, terminating by the kernel name.
The leaf nodes contain the actual profiling metric key-value pairs. Any type implementing the protocol ProfilingMetrics can be used as a profiling metrics entry.

This class provides convenient methods for hierarchical data access and manipulation.

Example structure:

Profiling results
└── 'nvtx range'
    ├── 'nvtx region'
    │   └── 'kernel'
    │       ├── 'metric i'  -> ProfilingMetricData
    │       └── 'metric ii' -> ProfilingMetricData
    └── 'other nvtx region'
        └── 'other kernel'
            ├── 'metric i'  -> ProfilingMetricData
            └── 'metric ii' -> ProfilingMetricData

Note

Using a hierarchical data structure from a package such as hatchet could be a direction to explore in the future.

Note

It is not decorated with dataclasses.dataclass() because of https://github.com/mypyc/mypyc/issues/1061.

__init__(data: dict[str, ProfilingResults | ProfilingMetrics] | None = None) → NoneView on GitHub 

aggregate_metrics(accessors: Iterable[str], keys: Iterable[str] | None = None) → dict[str, int | float]View on GitHub 

Aggregate metric values across multiple leaf nodes with profiling metrics at accessor path accessors.

Parameters:: keys – Specific metric keys to aggregate. If None, uses all keys from the first leaf node.

assign_metrics(accessors: Sequence[str], data: ProfilingMetrics) → NoneView on GitHub 

Set the leaf node with profiling metrics data at accessor path accessors.

Creates the internal nodes in the hierarchy if needed.

data: Final[dict[str, ProfilingResults | ProfilingMetrics]]

iter_metrics(accessors: Iterable[str] = ()) → Generator[tuple[str, ProfilingMetrics], None, None]View on GitHub : Query the accessor path accessors, check that it leads to an internal node, check that all entries are leaf nodes with profiling metrics, and return an iterator over these leaf nodes with profiling metrics.

query(accessors: Iterable[str]) → ProfilingResults | ProfilingMetricsView on GitHub : Get the internal node in the hierarchy or the leaf node with profiling metrics at accessor path accessors.

query_filter(accessors: Iterable[str], predicate: Callable[[str], bool]) → ProfilingResultsView on GitHub : Query the accessor path accessors, check that it leads to an internal node, and return a new ProfilingResults with only the entries whose key satisfies predicate.

query_metrics(accessors: Iterable[str]) → ProfilingMetricsView on GitHub : Query the accessor path accessors, check that it leads to a leaf node with profiling metrics, and return this leaf node with profiling metrics.

query_single_next(accessors: Iterable[str]) → tuple[str, ProfilingResults | ProfilingMetrics]View on GitHub 

Query the accessor path accessors, check that it leads to an internal node with exactly one entry, and return this single entry.

This member function is useful for instance to access a single kernel within an NVTX range or region.

>>> from reprospect.tools.ncu import ProfilingResults
>>> results = ProfilingResults()
>>> results.assign_metrics(('my_nvtx_range', 'my_nvtx_region', 'my_kernel'), {'my_metric' : 42})
>>> results.query_single_next(('my_nvtx_range', 'my_nvtx_region'))
('my_kernel', {'my_metric': 42})

query_single_next_metrics(accessors: Iterable[str]) → tuple[str, ProfilingMetrics]View on GitHub : Query the accessor path accessors, check that it leads to an internal node with exactly one entry, check that this entry is a leaf node with profiling metrics, and return this leaf node with profiling metrics.

to_tree() → TreeView on GitHub : Convert to a rich.tree.Tree for nice printing.

class reprospect.tools.ncu.Report(*, path: Path | None = None, name: str | None = None, command: Command | None = None)View on GitHub 

Bases: object

This class is a wrapper around the Python tool provided by NVIDIA Nsight Compute to parse its reports.

In particular, the NVIDIA Python tool (ncu_report) provides low-level access to the collected data by iterating over ranges and actions. This class uses these functionalities to extract all the collected data into a custom data structure of type ProfilingResults. This data structures is a nested tree data structure that provides a higher level, direct access to the data of interest by NVTX range (if NVTX is used) and by demangled kernel name.

References:

__init__(*, path: Path | None = None, name: str | None = None, command: Command | None = None) → NoneView on GitHub : Load the report <path>/<name>.ncu-rep or the report generated by reprospect.tools.ncu.session.Command.

Collect values of the metrics in the action.

References:

Extract the metrics of the actions in the range with ID range_idx. Possibly filter by NVTX with includes and excludes.

Parameters:: metrics – Must be iterable from start to end many times.

classmethod fill_metric(action: Action, metric: Metric) → int | float | dict[str, int | float]View on GitHub : Loop over submetrics of metric.

classmethod get_metric_by_name(*, action: Action, metric: str)View on GitHub : Read a metric in action.

get_metric_value(metric: Any, index: int | None = None) → int | float | strView on GitHub : Recent ncu (>= 2025.3.0.0) provide a value method.

class reprospect.tools.ncu.Session(command: Command)View on GitHub 

Bases: object

Nsight Compute session interface.

__init__(command: Command) → None

command: Command

run(cwd: ~pathlib.Path | None = None, env: ~typing.MutableMapping | None = None, retries: int = 1, sleep: ~typing.Callable[[int, int], float] = <function Session.<lambda>>) → NoneView on GitHub 

Run ncu using command.

Parameters:

retries – ncu might fail acquiring some resources because other instances are running. Retry a few times. See https://docs.nvidia.com/nsight-compute/ProfilingGuide/index.html#faq (Profiling failed because a driver resource was unavailable).
sleep – The time to sleep between successive retries. The callable is given the current retry index (descending) and the amount of allowed retries.

Warning

According to https://developer.nvidia.com/nvidia-development-tools-solutions-err_nvgpuctrperm-permission-issue-performance-counters#ElevPrivsTag, GPU performance counters are not available to all users by default.

Note

As of ncu 2025.1.1.0, a note tells us that specified NVTX include expressions match only start/end ranges.

References:

https://docs.nvidia.com/nsight-compute/NsightComputeCli/index.html#nvtx-filtering

reprospect.tools.ncu package

Submodules