Dispatch

class examples.kokkos.graph.example_dispatch.TestDispatchView on GitHub 

Bases: CMakeAwareTestCase

Trace the CUDA API calls during Kokkos::Experimental::Graph stages.

It uses examples/kokkos/graph/example_dispatch.cpp.

KOKKOS_TOOLS_NVTX_CONNECTOR_LIB: Used in TestNSYS.report().

classmethod get_target_name() → strView on GitHub 

class examples.kokkos.graph.example_dispatch.TestNSYSView on GitHub 

Bases: TestDispatch

nsys-focused analysis.

NODE_COUNT: Final[int] = 5

static get(*, report: Report, kernels: DataFrame, label: str) → DataFrameView on GitHub : Get kernels from kernels table that are correlated to the cudaGraphLaunch API call in the NVTX region label.

pytestmark = [Mark(name='skipif', args=(True,), kwargs={'reason': 'needs a GPU'})]

report() → ReportView on GitHub : Analyse with nsys, use reprospect.tools.nsys.Cacher.

test_streams(report: Report) → NoneView on GitHub 

Each kernel gets a unique stream ID.

It means that at the CUDA backend level, all nodes are shown to the kernel scheduler as independent and may be executed concurrently.

It must be noted that CUDA does not provide a way to create a graph node and enforce the stream on which it will eventually run. This has motivated a refactoring of the Kokkos::Experimental::Graph API, see https://github.com/kokkos/kokkos/pull/8191.