Dispatch

class examples.kokkos.graph.example_dispatch.TestDispatchView on GitHub

Bases: CMakeAwareTestCase

Trace the CUDA API calls during Kokkos::Experimental::Graph stages.

It uses examples/kokkos/graph/example_dispatch.cpp.

KOKKOS_TOOLS_NVTX_CONNECTOR_LIB

Used in TestNSYS.report().

classmethod get_target_name() strView on GitHub
class examples.kokkos.graph.example_dispatch.TestNSYSView on GitHub

Bases: TestDispatch

nsys-focused analysis.

NODE_COUNT: Final[int] = 5
static get(*, report: Report, kernels: DataFrame, label: str) DataFrameView on GitHub

Get kernels from kernels table that are correlated to the cudaGraphLaunch API call in the NVTX region label.

pytestmark = [Mark(name='skipif', args=(True,), kwargs={'reason': 'needs a GPU'})]
report() ReportView on GitHub

Analyse with nsys, use reprospect.tools.nsys.Cacher.

test_streams(report: Report) NoneView on GitHub

Each kernel gets a unique stream ID.

It means that at the CUDA backend level, all nodes are shown to the kernel scheduler as independent and may be executed concurrently.

It must be noted that CUDA does not provide a way to create a graph node and enforce the stream on which it will eventually run. This has motivated a refactoring of the Kokkos::Experimental::Graph API, see https://github.com/kokkos/kokkos/pull/8191.