reprospect.tools.ncu package
- class reprospect.tools.ncu.Cacher(*, directory: str | Path | None = None)View on GitHub
Bases:
CacherCacher tailored to
ncuresults.ncurequire quite some time to acquire results, especially when there are many kernels to profile and/or many metrics to collect.On a cache hit, the cacher will serve:
<cache_key>.ncu-repfile.logfile
On a cache miss,
ncuis launched and the cache entry populated accordingly.Note
It is assumed that hashing is faster than running
ncuitself.Warning
The cache should not be shared between machines, since there may be differences between machines that influence the results but are not included in the hashing.
- __init__(*, directory: str | Path | None = None)View on GitHub
- hash(**kwargs) blake3View on GitHub
- hash_impl(*, command: Command) blake3View on GitHub
Hash based on:
ncuversionncuoptions (but not the output and log files)executable content
executable arguments
linked libraries
environment
- populate(directory: Path, **kwargs) NoneView on GitHub
When there is a cache miss, call
reprospect.tools.ncu.Session.run(). Fill the directory with the artifacts.
- run(command: Command, **kwargs) EntryView on GitHub
On a cache hit, copy files from the cache entry.
- class reprospect.tools.ncu.Command(*, executable: str | Path, output: Path, opts: tuple[str, ...] = (), metrics: tuple[Metric | MetricCorrelation | MetricDeviceAttribute, ...] | None = None, nvtx_includes: tuple[str, ...] | None = None, args: tuple[str | Path, ...] | None = None, env: Mapping[str, str] | None = None)View on GitHub
Bases:
objectRun a
ncucommand line.- __init__(*, executable: str | Path, output: Path, opts: tuple[str, ...] = (), metrics: tuple[Metric | MetricCorrelation | MetricDeviceAttribute, ...] | None = None, nvtx_includes: tuple[str, ...] | None = None, args: tuple[str | Path, ...] | None = None, env: Mapping[str, str] | None = None) None
Method generated by attrs for class Command.
- metrics: tuple[Metric | MetricCorrelation | MetricDeviceAttribute, ...] | None
Metrics.
- nvtx_includes: tuple[str, ...] | None
NVTX include. Refer to https://docs.nvidia.com/nsight-compute/2023.3/NsightComputeCli/index.html#nvtx-filtering.
- run(*, cwd: Path | None = None, env: MutableMapping[str, str] | None = None) intView on GitHub
- class reprospect.tools.ncu.L1TEXCacheView on GitHub
Bases:
objectA selection of metrics related to L1/TEX cache.
See [Str21].
- GlobalLoadView on GitHub
alias of
L1TEXCacheGlobalLoad
- GlobalStoreView on GitHub
alias of
L1TEXCacheGlobalStore
- LocalStoreView on GitHub
alias of
L1TEXCacheLocalStore
- class reprospect.tools.ncu.L1TEXCacheGlobalLoadView on GitHub
Bases:
object- InstructionsView on GitHub
alias of
L1TEXCacheGlobalLoadInstructions
- RequestsView on GitHub
alias of
L1TEXCacheGlobalLoadRequests
- SectorHitsView on GitHub
alias of
L1TEXCacheGlobalLoadSectorHits
- SectorMissesView on GitHub
alias of
L1TEXCacheGlobalLoadSectorMisses
- SectorsView on GitHub
alias of
L1TEXCacheGlobalLoadSectors
- WavefrontsView on GitHub
alias of
L1TEXCacheGlobalLoadWavefronts
- class reprospect.tools.ncu.L1TEXCacheGlobalLoadInstructionsView on GitHub
Bases:
objectFactory of counter metric
(unit)__(sass?)_inst_executed_op_global_ld.- static create(*, unit: Unit = Unit.SMSP, mode: Literal['sass'] | None = 'sass', subs: tuple[MetricCounterRollUp, ...] = (MetricCounterRollUp.SUM,)) MetricCounterView on GitHub
- class reprospect.tools.ncu.L1TEXCacheGlobalLoadRequestsView on GitHub
Bases:
objectFactory of counter metric
l1tex__t_requests_pipe_lsu_mem_global_op_ld.- static create(*, subs: tuple[MetricCounterRollUp, ...] = (MetricCounterRollUp.SUM,)) MetricCounterView on GitHub
- class reprospect.tools.ncu.L1TEXCacheGlobalLoadSectorHitsView on GitHub
Bases:
objectFactory of counter metric
l1tex__t_sectors_pipe_lsu_mem_global_op_ld_lookup_hit.- static create(*, subs: tuple[MetricCounterRollUp, ...] = (MetricCounterRollUp.SUM,)) MetricCounterView on GitHub
- class reprospect.tools.ncu.L1TEXCacheGlobalLoadSectorMissesView on GitHub
Bases:
objectFactory of counter metric
l1tex__t_sectors_pipe_lsu_mem_global_op_ld_lookup_miss.- static create(*, subs: tuple[MetricCounterRollUp, ...] = (MetricCounterRollUp.SUM,)) MetricCounterView on GitHub
- class reprospect.tools.ncu.L1TEXCacheGlobalLoadSectorsView on GitHub
Bases:
objectFactory of counter metric
l1tex__t_sectors_pipe_lsu_mem_global_op_ld.- static create(*, subs: tuple[MetricCounterRollUp, ...] = (MetricCounterRollUp.SUM,), suffix: Literal['hit', 'miss'] | None = None) MetricCounterView on GitHub
- class reprospect.tools.ncu.L1TEXCacheGlobalLoadWavefrontsView on GitHub
Bases:
objectFactory of counter metric
l1tex__t_wavefronts_pipe_lsu_mem_global_op_ld.- static create(*, subs: tuple[MetricCounterRollUp, ...] = (MetricCounterRollUp.SUM,)) MetricCounterView on GitHub
- class reprospect.tools.ncu.L1TEXCacheGlobalStoreView on GitHub
Bases:
object- InstructionsView on GitHub
alias of
L1TEXCacheGlobalStoreInstructions
- SectorsView on GitHub
alias of
L1TEXCacheGlobalStoreSectors
- class reprospect.tools.ncu.L1TEXCacheGlobalStoreInstructionsView on GitHub
Bases:
objectFactory of counter metric
(unit)__(sass?)_inst_executed_op_global_st.- static create(*, unit: Unit = Unit.SMSP, mode: Literal['sass'] | None = 'sass', subs: tuple[MetricCounterRollUp, ...] = (MetricCounterRollUp.SUM,)) MetricCounterView on GitHub
- class reprospect.tools.ncu.L1TEXCacheGlobalStoreSectorsView on GitHub
Bases:
objectFactory of counter metric
l1tex__t_sectors_pipe_lsu_mem_global_op_st.- static create(*, subs: tuple[MetricCounterRollUp, ...] = (MetricCounterRollUp.SUM,)) MetricCounterView on GitHub
- class reprospect.tools.ncu.L1TEXCacheLocalStoreView on GitHub
Bases:
object- InstructionsView on GitHub
alias of
L1TEXCacheLocalStoreInstructions
- class reprospect.tools.ncu.L1TEXCacheLocalStoreInstructionsView on GitHub
Bases:
objectFactory of counter metric
(unit)__(sass?)_inst_executed_op_local_st.- static create(*, unit: Unit = Unit.SMSP, mode: Literal['sass'] | None = 'sass', subs: tuple[MetricCounterRollUp, ...] = (MetricCounterRollUp.SUM,)) MetricCounterView on GitHub
- class reprospect.tools.ncu.LaunchBlockView on GitHub
Bases:
XYZBaseFactory of metrics
launch__block_dim_x,launch__block_dim_yandlaunch__block_dim_z.
- class reprospect.tools.ncu.LaunchGridView on GitHub
Bases:
XYZBaseFactory of metrics
launch__grid_dim_x,launch__grid_dim_yandlaunch__grid_dim_z.
- class reprospect.tools.ncu.Metric(name: str, pretty_name: str | None = None, subs: tuple[str, ...] | None = None)View on GitHub
Bases:
objectUsed to represent a
ncumetric.If
subsis not given, it is assumed thatnameis a valid metric that can be directly evaluated byncu.References:
- gather() tuple[str, ...]View on GitHub
Get the list of sub-metric names or the metric name itself if no sub-metrics are defined.
- class reprospect.tools.ncu.MetricCorrelation(name: str)View on GitHub
Bases:
objectA metric with correlations, like
sass__inst_executed_per_opcode.References:
- gather() tuple[str]View on GitHub
- class reprospect.tools.ncu.MetricCounter(name: str, pretty_name: str | None = None, subs: tuple[str, ...] | None = None)View on GitHub
Bases:
MetricA counter metric.
The sub-metric names are expected to be from
MetricCounterRollUp.References:
- class reprospect.tools.ncu.MetricCounterRollUp(*values)View on GitHub
Bases:
StrEnumAvailable roll-ups for
MetricCounter.- AVG = 'avg'
- MAX = 'max'
- MIN = 'min'
- SUM = 'sum'
- __str__()
Return str(self).
- class reprospect.tools.ncu.MetricDeviceAttribute(name: str)View on GitHub
Bases:
objectncudevice attribute metric, such as:device__attribute_architectureNote
Available device attribute metrics can be queryied with:
ncu --query-metrics-collection=device
- property full_name: strView on GitHub
- gather() tuple[str]View on GitHub
- class reprospect.tools.ncu.MetricRatio(name: str, pretty_name: str | None = None, subs: tuple[str, ...] | None = None)View on GitHub
Bases:
MetricA ratio metric.
The sub-metric names are expected to be from
MetricRatioRollUp.References:
- class reprospect.tools.ncu.MetricRatioRollUp(*values)View on GitHub
Bases:
StrEnumAvailable roll-ups for
MetricRatio.- MAX_RATE = 'max_rate'
- PCT = 'pct'
- RATIO = 'ratio'
- __str__()
Return str(self).
- class reprospect.tools.ncu.ProfilingMetrics(data: dict[str, int | float | dict[str, int | float] | MetricCorrelationData | str])View on GitHub
Bases:
Mapping[str,int|float|dict[str,int|float] |MetricCorrelationData|str]Mapping of profiling metric keys to their values.
Note
It is not decorated with
dataclasses.dataclass()because of https://github.com/mypyc/mypyc/issues/1061.
- class reprospect.tools.ncu.ProfilingResults(data: dict[str, ProfilingResults | ProfilingMetrics] | None = None)View on GitHub
Bases:
TreeMixinNested tree data structure for storing profiling results.
The data structure consists of internal nodes and leaf nodes:
The internal nodes are themselves
ProfilingResultsinstances. They organise results hierarchically by the profiling range (e.g. NVTX range) that they were obtained from, terminating by the kernel name.The leaf nodes contain the actual profiling metric key-value pairs. Any type implementing the protocol
ProfilingMetricscan be used as a profiling metrics entry.
This class provides convenient methods for hierarchical data access and manipulation.
Example structure:
Profiling results └── 'nvtx range' ├── 'nvtx region' │ └── 'kernel' │ ├── 'metric i' -> ProfilingMetricData │ └── 'metric ii' -> ProfilingMetricData └── 'other nvtx region' └── 'other kernel' ├── 'metric i' -> ProfilingMetricData └── 'metric ii' -> ProfilingMetricDataNote
Using a hierarchical data structure from a package such as hatchet could be a direction to explore in the future.
Note
It is not decorated with
dataclasses.dataclass()because of https://github.com/mypyc/mypyc/issues/1061.- __init__(data: dict[str, ProfilingResults | ProfilingMetrics] | None = None) NoneView on GitHub
- aggregate_metrics(accessors: Iterable[str], keys: Iterable[str] | None = None) dict[str, int | float]View on GitHub
Aggregate metric values across multiple leaf nodes with profiling metrics at accessor path accessors.
- Parameters:
keys – Specific metric keys to aggregate. If
None, uses all keys from the first leaf node.
- assign_metrics(accessors: Sequence[str], data: ProfilingMetrics) NoneView on GitHub
Set the leaf node with profiling metrics data at accessor path accessors.
Creates the internal nodes in the hierarchy if needed.
- data: Final[dict[str, ProfilingResults | ProfilingMetrics]]
- iter_metrics(accessors: Iterable[str] = ()) Generator[tuple[str, ProfilingMetrics], None, None]View on GitHub
Query the accessor path accessors, check that it leads to an internal node, check that all entries are leaf nodes with profiling metrics, and return an iterator over these leaf nodes with profiling metrics.
- query(accessors: Iterable[str]) ProfilingResults | ProfilingMetricsView on GitHub
Get the internal node in the hierarchy or the leaf node with profiling metrics at accessor path accessors.
- query_filter(accessors: Iterable[str], predicate: Callable[[str], bool]) ProfilingResultsView on GitHub
Query the accessor path accessors, check that it leads to an internal node, and return a new
ProfilingResultswith only the entries whose key satisfies predicate.
- query_metrics(accessors: Iterable[str]) ProfilingMetricsView on GitHub
Query the accessor path accessors, check that it leads to a leaf node with profiling metrics, and return this leaf node with profiling metrics.
- query_single_next(accessors: Iterable[str]) tuple[str, ProfilingResults | ProfilingMetrics]View on GitHub
Query the accessor path accessors, check that it leads to an internal node with exactly one entry, and return this single entry.
This member function is useful for instance to access a single kernel within an NVTX range or region.
>>> from reprospect.tools.ncu import ProfilingResults >>> results = ProfilingResults() >>> results.assign_metrics(('my_nvtx_range', 'my_nvtx_region', 'my_kernel'), {'my_metric' : 42}) >>> results.query_single_next(('my_nvtx_range', 'my_nvtx_region')) ('my_kernel', {'my_metric': 42})
- query_single_next_metrics(accessors: Iterable[str]) tuple[str, ProfilingMetrics]View on GitHub
Query the accessor path accessors, check that it leads to an internal node with exactly one entry, check that this entry is a leaf node with profiling metrics, and return this leaf node with profiling metrics.
- to_tree() TreeView on GitHub
Convert to a
rich.tree.Treefor nice printing.
- class reprospect.tools.ncu.Report(*, path: Path | None = None, name: str | None = None, command: Command | None = None)View on GitHub
Bases:
objectThis class is a wrapper around the Python tool provided by NVIDIA Nsight Compute to parse its reports.
In particular, the NVIDIA Python tool (
ncu_report) provides low-level access to the collected data by iterating over ranges and actions. This class uses these functionalities to extract all the collected data into a custom data structure of typeProfilingResults. This data structures is a nested tree data structure that provides a higher level, direct access to the data of interest by NVTX range (if NVTX is used) and by demangled kernel name.References:
https://docs.nvidia.com/nsight-compute/CustomizationGuide/index.html#python-report-interface
https://docs.python.org/3/library/importlib.html#importing-a-source-file-directly
- __init__(*, path: Path | None = None, name: str | None = None, command: Command | None = None) NoneView on GitHub
Load the report
<path>/<name>.ncu-repor the report generated byreprospect.tools.ncu.session.Command.
- collect_metrics_from_action(*, metrics: Iterable[Metric | MetricCorrelation | MetricDeviceAttribute], action: Action) dict[str, int | float | dict[str, int | float] | MetricCorrelationData | str]View on GitHub
Collect values of the metrics in the action.
References:
- extract_results_in_range(metrics: Collection[Metric | MetricCorrelation | MetricDeviceAttribute], range_idx: int = 0, includes: Iterable[str] | None = None, excludes: Iterable[str] | None = None, demangler: type[CuppFilt | LlvmCppFilt] | None = None) ProfilingResultsView on GitHub
Extract the metrics of the actions in the range with ID range_idx. Possibly filter by NVTX with includes and excludes.
- Parameters:
metrics – Must be iterable from start to end many times.
- classmethod fill_metric(action: Action, metric: Metric) int | float | dict[str, int | float]View on GitHub
Loop over submetrics of metric.
- classmethod get_metric_by_name(*, action: Action, metric: str)View on GitHub
Read a metric in action.
- class reprospect.tools.ncu.Session(command: Command)View on GitHub
Bases:
objectNsight Compute session interface.
- run(cwd: ~pathlib.Path | None = None, env: ~typing.MutableMapping | None = None, retries: int = 1, sleep: ~typing.Callable[[int, int], float] = <function Session.<lambda>>) NoneView on GitHub
Run
ncuusingcommand.- Parameters:
retries –
ncumight fail acquiring some resources because other instances are running. Retry a few times. See https://docs.nvidia.com/nsight-compute/ProfilingGuide/index.html#faq (Profiling failed because a driver resource was unavailable).sleep – The time to sleep between successive retries. The callable is given the current retry index (descending) and the amount of allowed retries.
Warning
According to https://developer.nvidia.com/nvidia-development-tools-solutions-err_nvgpuctrperm-permission-issue-performance-counters#ElevPrivsTag, GPU performance counters are not available to all users by default.
Note
As of
ncu2025.1.1.0, a note tells us that specified NVTX include expressions match only start/end ranges.References:
Submodules
- reprospect.tools.ncu.cacher module
- reprospect.tools.ncu.metrics module
L1TEXCacheL1TEXCacheGlobalLoadL1TEXCacheGlobalLoadInstructionsL1TEXCacheGlobalLoadRequestsL1TEXCacheGlobalLoadSectorHitsL1TEXCacheGlobalLoadSectorMissesL1TEXCacheGlobalLoadSectorsL1TEXCacheGlobalLoadWavefrontsL1TEXCacheGlobalStoreL1TEXCacheGlobalStoreInstructionsL1TEXCacheGlobalStoreSectorsL1TEXCacheLocalStoreL1TEXCacheLocalStoreInstructionsLaunchBlockLaunchGridMetricMetricCorrelationMetricCorrelationDataMetricCounterMetricCounterRollUpMetricDataMetricDeviceAttributeMetricRatioMetricRatioRollUpPipeStageQuantityUnitValueTypeXYZBasecounter_name_from()gather()
- reprospect.tools.ncu.report module
ActionNvtxDomainProfilingMetricDataProfilingMetricsProfilingResultsProfilingResults.__init__()ProfilingResults.aggregate_metrics()ProfilingResults.assign_metrics()ProfilingResults.dataProfilingResults.iter_metrics()ProfilingResults.query()ProfilingResults.query_filter()ProfilingResults.query_metrics()ProfilingResults.query_single_next()ProfilingResults.query_single_next_metrics()ProfilingResults.to_tree()
RangeReportload_ncu_report()
- reprospect.tools.ncu.session module