reprospect.tools.ncu package

class reprospect.tools.ncu.Cacher(*, directory: str | Path | None = None)View on GitHub

Bases: Cacher

Cacher tailored to ncu results.

ncu require quite some time to acquire results, especially when there are many kernels to profile and/or many metrics to collect.

On a cache hit, the cacher will serve:

  • <cache_key>.ncu-rep file

  • .log file

On a cache miss, ncu is launched and the cache entry populated accordingly.

Note

It is assumed that hashing is faster than running ncu itself.

Warning

The cache should not be shared between machines, since there may be differences between machines that influence the results but are not included in the hashing.

TABLE: ClassVar[str] = 'ncu'

Name of the table.

__init__(*, directory: str | Path | None = None)View on GitHub
hash(**kwargs) blake3View on GitHub
hash_impl(*, command: Command) blake3View on GitHub

Hash based on:

  • ncu version

  • ncu options (but not the output and log files)

  • executable content

  • executable arguments

  • linked libraries

  • environment

populate(directory: Path, **kwargs) NoneView on GitHub

When there is a cache miss, call reprospect.tools.ncu.Session.run(). Fill the directory with the artifacts.

run(command: Command, **kwargs) EntryView on GitHub

On a cache hit, copy files from the cache entry.

class reprospect.tools.ncu.Command(*, executable: str | Path, output: Path, opts: tuple[str, ...] = (), metrics: tuple[Metric | MetricCorrelation | MetricDeviceAttribute, ...] | None = None, nvtx_includes: tuple[str, ...] | None = None, args: tuple[str | Path, ...] | None = None, env: Mapping[str, str] | None = None)View on GitHub

Bases: object

Run a ncu command line.

__init__(*, executable: str | Path, output: Path, opts: tuple[str, ...] = (), metrics: tuple[Metric | MetricCorrelation | MetricDeviceAttribute, ...] | None = None, nvtx_includes: tuple[str, ...] | None = None, args: tuple[str | Path, ...] | None = None, env: Mapping[str, str] | None = None) None

Method generated by attrs for class Command.

args: tuple[str | Path, ...] | None

Arguments to pass to the executable.

cmd: tuple[str | Path, ...]
env: Mapping[str, str] | None

Mapping used to update the environment before running, see run().

executable: str | Path

Executable to run.

log: Path

Log file.

metrics: tuple[Metric | MetricCorrelation | MetricDeviceAttribute, ...] | None

Metrics.

nvtx_includes: tuple[str, ...] | None

NVTX include. Refer to https://docs.nvidia.com/nsight-compute/2023.3/NsightComputeCli/index.html#nvtx-filtering.

opts: tuple[str, ...]

Options that do not involve paths.

output: Path

Report file.

run(*, cwd: Path | None = None, env: MutableMapping[str, str] | None = None) intView on GitHub
class reprospect.tools.ncu.L1TEXCacheView on GitHub

Bases: object

A selection of metrics related to L1/TEX cache.

See [Str21].

GlobalLoadView on GitHub

alias of L1TEXCacheGlobalLoad

GlobalStoreView on GitHub

alias of L1TEXCacheGlobalStore

LocalStoreView on GitHub

alias of L1TEXCacheLocalStore

NAME: Final[str] = 'L1/TEX cache'
class reprospect.tools.ncu.L1TEXCacheGlobalLoadView on GitHub

Bases: object

InstructionsView on GitHub

alias of L1TEXCacheGlobalLoadInstructions

NAME: Final[str] = 'global load'
RequestsView on GitHub

alias of L1TEXCacheGlobalLoadRequests

SectorHitsView on GitHub

alias of L1TEXCacheGlobalLoadSectorHits

SectorMissesView on GitHub

alias of L1TEXCacheGlobalLoadSectorMisses

SectorsView on GitHub

alias of L1TEXCacheGlobalLoadSectors

WavefrontsView on GitHub

alias of L1TEXCacheGlobalLoadWavefronts

class reprospect.tools.ncu.L1TEXCacheGlobalLoadInstructionsView on GitHub

Bases: object

Factory of counter metric (unit)__(sass?)_inst_executed_op_global_ld.

static create(*, unit: Unit = Unit.SMSP, mode: Literal['sass'] | None = 'sass', subs: tuple[MetricCounterRollUp, ...] = (MetricCounterRollUp.SUM,)) MetricCounterView on GitHub
class reprospect.tools.ncu.L1TEXCacheGlobalLoadRequestsView on GitHub

Bases: object

Factory of counter metric l1tex__t_requests_pipe_lsu_mem_global_op_ld.

static create(*, subs: tuple[MetricCounterRollUp, ...] = (MetricCounterRollUp.SUM,)) MetricCounterView on GitHub
class reprospect.tools.ncu.L1TEXCacheGlobalLoadSectorHitsView on GitHub

Bases: object

Factory of counter metric l1tex__t_sectors_pipe_lsu_mem_global_op_ld_lookup_hit.

static create(*, subs: tuple[MetricCounterRollUp, ...] = (MetricCounterRollUp.SUM,)) MetricCounterView on GitHub
class reprospect.tools.ncu.L1TEXCacheGlobalLoadSectorMissesView on GitHub

Bases: object

Factory of counter metric l1tex__t_sectors_pipe_lsu_mem_global_op_ld_lookup_miss.

static create(*, subs: tuple[MetricCounterRollUp, ...] = (MetricCounterRollUp.SUM,)) MetricCounterView on GitHub
class reprospect.tools.ncu.L1TEXCacheGlobalLoadSectorsView on GitHub

Bases: object

Factory of counter metric l1tex__t_sectors_pipe_lsu_mem_global_op_ld.

static create(*, subs: tuple[MetricCounterRollUp, ...] = (MetricCounterRollUp.SUM,), suffix: Literal['hit', 'miss'] | None = None) MetricCounterView on GitHub
class reprospect.tools.ncu.L1TEXCacheGlobalLoadWavefrontsView on GitHub

Bases: object

Factory of counter metric l1tex__t_wavefronts_pipe_lsu_mem_global_op_ld.

static create(*, subs: tuple[MetricCounterRollUp, ...] = (MetricCounterRollUp.SUM,)) MetricCounterView on GitHub
class reprospect.tools.ncu.L1TEXCacheGlobalStoreView on GitHub

Bases: object

InstructionsView on GitHub

alias of L1TEXCacheGlobalStoreInstructions

NAME: Final[str] = 'global store'
SectorsView on GitHub

alias of L1TEXCacheGlobalStoreSectors

class reprospect.tools.ncu.L1TEXCacheGlobalStoreInstructionsView on GitHub

Bases: object

Factory of counter metric (unit)__(sass?)_inst_executed_op_global_st.

static create(*, unit: Unit = Unit.SMSP, mode: Literal['sass'] | None = 'sass', subs: tuple[MetricCounterRollUp, ...] = (MetricCounterRollUp.SUM,)) MetricCounterView on GitHub
class reprospect.tools.ncu.L1TEXCacheGlobalStoreSectorsView on GitHub

Bases: object

Factory of counter metric l1tex__t_sectors_pipe_lsu_mem_global_op_st.

static create(*, subs: tuple[MetricCounterRollUp, ...] = (MetricCounterRollUp.SUM,)) MetricCounterView on GitHub
class reprospect.tools.ncu.L1TEXCacheLocalStoreView on GitHub

Bases: object

InstructionsView on GitHub

alias of L1TEXCacheLocalStoreInstructions

NAME: Final[str] = 'local store'
class reprospect.tools.ncu.L1TEXCacheLocalStoreInstructionsView on GitHub

Bases: object

Factory of counter metric (unit)__(sass?)_inst_executed_op_local_st.

static create(*, unit: Unit = Unit.SMSP, mode: Literal['sass'] | None = 'sass', subs: tuple[MetricCounterRollUp, ...] = (MetricCounterRollUp.SUM,)) MetricCounterView on GitHub
class reprospect.tools.ncu.LaunchBlockView on GitHub

Bases: XYZBase

Factory of metrics launch__block_dim_x, launch__block_dim_y and launch__block_dim_z.

prefix: ClassVar[str] = 'launch__block_dim_'
class reprospect.tools.ncu.LaunchGridView on GitHub

Bases: XYZBase

Factory of metrics launch__grid_dim_x, launch__grid_dim_y and launch__grid_dim_z.

prefix: ClassVar[str] = 'launch__grid_dim_'
class reprospect.tools.ncu.Metric(name: str, pretty_name: str | None = None, subs: tuple[str, ...] | None = None)View on GitHub

Bases: object

Used to represent a ncu metric.

If subs is not given, it is assumed that name is a valid metric that can be directly evaluated by ncu.

References:

__init__(name: str, pretty_name: str | None = None, subs: tuple[str, ...] | None = None) None
gather() tuple[str, ...]View on GitHub

Get the list of sub-metric names or the metric name itself if no sub-metrics are defined.

name: str

The base name of the metric.

pretty_name: str | None

Human readable name.

subs: tuple[str, ...] | None

Optional sub-metric names.

class reprospect.tools.ncu.MetricCorrelation(name: str)View on GitHub

Bases: object

A metric with correlations, like sass__inst_executed_per_opcode.

References:

__init__(name: str) None
gather() tuple[str]View on GitHub
name: str
class reprospect.tools.ncu.MetricCounter(name: str, pretty_name: str | None = None, subs: tuple[str, ...] | None = None)View on GitHub

Bases: Metric

A counter metric.

The sub-metric names are expected to be from MetricCounterRollUp.

References:

__init__(name: str, pretty_name: str | None = None, subs: tuple[str, ...] | None = None) None
class reprospect.tools.ncu.MetricCounterRollUp(*values)View on GitHub

Bases: StrEnum

Available roll-ups for MetricCounter.

AVG = 'avg'
MAX = 'max'
MIN = 'min'
SUM = 'sum'
__str__()

Return str(self).

class reprospect.tools.ncu.MetricDeviceAttribute(name: str)View on GitHub

Bases: object

ncu device attribute metric, such as:

device__attribute_architecture

Note

Available device attribute metrics can be queryied with:

ncu --query-metrics-collection=device
__init__(name: str) None
property full_name: strView on GitHub
gather() tuple[str]View on GitHub
name: str
class reprospect.tools.ncu.MetricRatio(name: str, pretty_name: str | None = None, subs: tuple[str, ...] | None = None)View on GitHub

Bases: Metric

A ratio metric.

The sub-metric names are expected to be from MetricRatioRollUp.

References:

__init__(name: str, pretty_name: str | None = None, subs: tuple[str, ...] | None = None) None
class reprospect.tools.ncu.MetricRatioRollUp(*values)View on GitHub

Bases: StrEnum

Available roll-ups for MetricRatio.

MAX_RATE = 'max_rate'
PCT = 'pct'
RATIO = 'ratio'
__str__()

Return str(self).

class reprospect.tools.ncu.ProfilingMetrics(data: dict[str, int | float | dict[str, int | float] | MetricCorrelationData | str])View on GitHub

Bases: Mapping[str, int | float | dict[str, int | float] | MetricCorrelationData | str]

Mapping of profiling metric keys to their values.

Note

It is not decorated with dataclasses.dataclass() because of https://github.com/mypyc/mypyc/issues/1061.

__init__(data: dict[str, int | float | dict[str, int | float] | MetricCorrelationData | str]) NoneView on GitHub
data: Final[dict[str, int | float | dict[str, int | float] | MetricCorrelationData | str]]
class reprospect.tools.ncu.ProfilingResults(data: dict[str, ProfilingResults | ProfilingMetrics] | None = None)View on GitHub

Bases: TreeMixin

Nested tree data structure for storing profiling results.

The data structure consists of internal nodes and leaf nodes:

  • The internal nodes are themselves ProfilingResults instances. They organise results hierarchically by the profiling range (e.g. NVTX range) that they were obtained from, terminating by the kernel name.

  • The leaf nodes contain the actual profiling metric key-value pairs. Any type implementing the protocol ProfilingMetrics can be used as a profiling metrics entry.

This class provides convenient methods for hierarchical data access and manipulation.

Example structure:

Profiling results
└── 'nvtx range'
    ├── 'nvtx region'
    │   └── 'kernel'
    │       ├── 'metric i'  -> ProfilingMetricData
    │       └── 'metric ii' -> ProfilingMetricData
    └── 'other nvtx region'
        └── 'other kernel'
            ├── 'metric i'  -> ProfilingMetricData
            └── 'metric ii' -> ProfilingMetricData

Note

Using a hierarchical data structure from a package such as hatchet could be a direction to explore in the future.

Note

It is not decorated with dataclasses.dataclass() because of https://github.com/mypyc/mypyc/issues/1061.

__init__(data: dict[str, ProfilingResults | ProfilingMetrics] | None = None) NoneView on GitHub
aggregate_metrics(accessors: Iterable[str], keys: Iterable[str] | None = None) dict[str, int | float]View on GitHub

Aggregate metric values across multiple leaf nodes with profiling metrics at accessor path accessors.

Parameters:

keys – Specific metric keys to aggregate. If None, uses all keys from the first leaf node.

assign_metrics(accessors: Sequence[str], data: ProfilingMetrics) NoneView on GitHub

Set the leaf node with profiling metrics data at accessor path accessors.

Creates the internal nodes in the hierarchy if needed.

data: Final[dict[str, ProfilingResults | ProfilingMetrics]]
iter_metrics(accessors: Iterable[str] = ()) Generator[tuple[str, ProfilingMetrics], None, None]View on GitHub

Query the accessor path accessors, check that it leads to an internal node, check that all entries are leaf nodes with profiling metrics, and return an iterator over these leaf nodes with profiling metrics.

query(accessors: Iterable[str]) ProfilingResults | ProfilingMetricsView on GitHub

Get the internal node in the hierarchy or the leaf node with profiling metrics at accessor path accessors.

query_filter(accessors: Iterable[str], predicate: Callable[[str], bool]) ProfilingResultsView on GitHub

Query the accessor path accessors, check that it leads to an internal node, and return a new ProfilingResults with only the entries whose key satisfies predicate.

query_metrics(accessors: Iterable[str]) ProfilingMetricsView on GitHub

Query the accessor path accessors, check that it leads to a leaf node with profiling metrics, and return this leaf node with profiling metrics.

query_single_next(accessors: Iterable[str]) tuple[str, ProfilingResults | ProfilingMetrics]View on GitHub

Query the accessor path accessors, check that it leads to an internal node with exactly one entry, and return this single entry.

This member function is useful for instance to access a single kernel within an NVTX range or region.

>>> from reprospect.tools.ncu import ProfilingResults
>>> results = ProfilingResults()
>>> results.assign_metrics(('my_nvtx_range', 'my_nvtx_region', 'my_kernel'), {'my_metric' : 42})
>>> results.query_single_next(('my_nvtx_range', 'my_nvtx_region'))
('my_kernel', {'my_metric': 42})
query_single_next_metrics(accessors: Iterable[str]) tuple[str, ProfilingMetrics]View on GitHub

Query the accessor path accessors, check that it leads to an internal node with exactly one entry, check that this entry is a leaf node with profiling metrics, and return this leaf node with profiling metrics.

to_tree() TreeView on GitHub

Convert to a rich.tree.Tree for nice printing.

class reprospect.tools.ncu.Report(*, path: Path | None = None, name: str | None = None, command: Command | None = None)View on GitHub

Bases: object

This class is a wrapper around the Python tool provided by NVIDIA Nsight Compute to parse its reports.

In particular, the NVIDIA Python tool (ncu_report) provides low-level access to the collected data by iterating over ranges and actions. This class uses these functionalities to extract all the collected data into a custom data structure of type ProfilingResults. This data structures is a nested tree data structure that provides a higher level, direct access to the data of interest by NVTX range (if NVTX is used) and by demangled kernel name.

References:

__init__(*, path: Path | None = None, name: str | None = None, command: Command | None = None) NoneView on GitHub

Load the report <path>/<name>.ncu-rep or the report generated by reprospect.tools.ncu.session.Command.

collect_metrics_from_action(*, metrics: Iterable[Metric | MetricCorrelation | MetricDeviceAttribute], action: Action) dict[str, int | float | dict[str, int | float] | MetricCorrelationData | str]View on GitHub

Collect values of the metrics in the action.

References:

extract_results_in_range(metrics: Collection[Metric | MetricCorrelation | MetricDeviceAttribute], range_idx: int = 0, includes: Iterable[str] | None = None, excludes: Iterable[str] | None = None, demangler: type[CuppFilt | LlvmCppFilt] | None = None) ProfilingResultsView on GitHub

Extract the metrics of the actions in the range with ID range_idx. Possibly filter by NVTX with includes and excludes.

Parameters:

metrics – Must be iterable from start to end many times.

classmethod fill_metric(action: Action, metric: Metric) int | float | dict[str, int | float]View on GitHub

Loop over submetrics of metric.

classmethod get_metric_by_name(*, action: Action, metric: str)View on GitHub

Read a metric in action.

get_metric_value(metric: Any, index: int | None = None) int | float | strView on GitHub

Recent ncu (>= 2025.3.0.0) provide a value method.

class reprospect.tools.ncu.Session(command: Command)View on GitHub

Bases: object

Nsight Compute session interface.

__init__(command: Command) None
command: Command
run(cwd: ~pathlib.Path | None = None, env: ~typing.MutableMapping | None = None, retries: int = 1, sleep: ~typing.Callable[[int, int], float] = <function Session.<lambda>>) NoneView on GitHub

Run ncu using command.

Parameters:
  • retriesncu might fail acquiring some resources because other instances are running. Retry a few times. See https://docs.nvidia.com/nsight-compute/ProfilingGuide/index.html#faq (Profiling failed because a driver resource was unavailable).

  • sleep – The time to sleep between successive retries. The callable is given the current retry index (descending) and the amount of allowed retries.

Warning

According to https://developer.nvidia.com/nvidia-development-tools-solutions-err_nvgpuctrperm-permission-issue-performance-counters#ElevPrivsTag, GPU performance counters are not available to all users by default.

Note

As of ncu 2025.1.1.0, a note tells us that specified NVTX include expressions match only start/end ranges.

References:

Submodules