Kokkos::atomic_add

Kokkos provides extended atomic support for objects of arbitrary size. Therefore, it has to support types that are not directly handled by the backend. This is achieved through the desul library [TrottLebrunGrandieArndt+22], that, depending on the size of the object and the targeted hardware, maps atomic operations to either:

  1. atomic instruction

  2. CAS loop

  3. sharded lock table

Traditionally, CUDA atomics supported up to 64-bit size operations. Since compute capability 9.0, CUDA supports atomic CAS for objects up to 128-bit size. Therefore, there has been some effort in Kokkos to bring this support through desul. For instance, a Kokkos::atomic_add for 128-bit aligned Kokkos::complex<double> should use the sharded lock table implementation only for compute capability below 9.0, and resort to the CAS-based implementation otherwise.

To ensure that Kokkos implements the right code path, the following matchers can be used:

The following tests:

verify that Kokkos::atomic_add maps to the right implementation by looking for an instruction sequence pattern.

class examples.kokkos.atomic.desul.AtomicAcquireMatcherView on GitHub

Bases: object

Matcher for the trial to acquire a lock through an atomic exchange.

See:

classmethod build(arch: NVIDIAArch, compiler_id: str) OrderedInSequenceMatcherView on GitHub
class examples.kokkos.atomic.desul.AtomicReleaseMatcherView on GitHub

Bases: object

Matcher for the release of a lock through an atomic exchange.

See:

classmethod build(arch: NVIDIAArch, compiler_id: str) InstructionMatcherView on GitHub
class examples.kokkos.atomic.desul.DeviceAtomicThreadFenceMatcherView on GitHub

Bases: object

Matcher for the device atomic thread fence block.

See:

classmethod build(arch: NVIDIAArch) OrderedInSequenceMatcherView on GitHub
class examples.kokkos.atomic.desul.LockBasedAtomicMatcher(*, arch: NVIDIAArch, operation: Operation, compiler_id: str, size: int = 128, level: int = 20, load: SequenceMatcher | None = None, store: SequenceMatcher | None = None)View on GitHub

Bases: SequenceMatcher

” Matcher for the desul lock-based atomic code path.

See:

__init__(*, arch: NVIDIAArch, operation: Operation, compiler_id: str, size: int = 128, level: int = 20, load: SequenceMatcher | None = None, store: SequenceMatcher | None = None) NoneView on GitHub
collect(matched: list[InstructionMatch], new: InstructionMatch | list[InstructionMatch]) intView on GitHub
match(instructions: Sequence[Instruction | str]) list[InstructionMatch] | NoneView on GitHub

Note

For data types that require many loads or stores, the operation instructions might be interleaved, such that the sequence within the memory thread fences is not strictly load/operation/store.

property next_index: intView on GitHub
class examples.kokkos.atomic.desul.Operation(*args, **kwargs)View on GitHub

Bases: Protocol

__init__(*args, **kwargs)
build(loads: Collection[InstructionMatch]) SequenceMatcherView on GitHub
examples.kokkos.atomic.desul.get_atomic_memory_suffix(compiler_id: str) Literal['G', '']View on GitHub

See tests.test.sass.test_atomic.TestAtomicMatcher.test_exch_device_ptr().

class examples.kokkos.atomic.add.TestCaseView on GitHub

Bases: CMakeAwareTestCase

Derived type must to define SIGNATURE_MATCHER.

SIGNATURE_MATCHER: ClassVar[Pattern[str]]
property cubin: PathView on GitHub
cuobjdump() CuObjDumpView on GitHub
decoder(cuobjdump: CuObjDump) DecoderView on GitHub
test() NoneView on GitHub

Run the executable.

class examples.kokkos.atomic.example_add_complex64.AddComplex64View on GitHub

Bases: object

Addition of two 64-bit complex values.

build(loads: Collection[InstructionMatch] | None = None) OrderedInterleavedInSequenceMatcher | UnorderedInterleavedInSequenceMatcherView on GitHub
class examples.kokkos.atomic.example_add_complex64.TestAtomicAddComplex64View on GitHub

Bases: TestCase

Tests for Kokkos::complex<float>.

SIGNATURE_MATCHER: ClassVar[Pattern[str]] = re.compile('AtomicAddFunctor<Kokkos::View<Kokkos::complex<float>\\s*\\*\\s*, Kokkos::CudaSpace>>')
classmethod get_target_name() strView on GitHub
test_cas_atomic(decoder: Decoder) NoneView on GitHub

This test proves that it uses the CAS-based implementation.

class examples.kokkos.atomic.example_add_complex128.AddComplex128View on GitHub

Bases: object

Addition of two 128-bit complex values.

  1. real parts

  2. imaginary parts

  3. possibly with NOP

build(loads: Collection[InstructionMatch] | None = None) UnorderedInSequenceMatcherView on GitHub
class examples.kokkos.atomic.example_add_complex128.TestAtomicAddComplex128View on GitHub

Bases: TestCase

Tests for Kokkos::complex<double>.

SIGNATURE_MATCHER: ClassVar[Pattern[str]] = re.compile('AtomicAddFunctor<Kokkos::View<Kokkos::complex<double>\\s*\\*\\s*, Kokkos::CudaSpace>>')
classmethod get_target_name() strView on GitHub
test_cas_atomic_as_of_hopper90(decoder: Decoder) NoneView on GitHub

This test proves that it uses the CAS-based implementation.

test_lock_atomic_before_hopper90(decoder: Decoder) NoneView on GitHub

This test proves that it uses the lock-based implementation.

class examples.kokkos.atomic.example_add_double256.AddDouble4(arch: NVIDIAArch)View on GitHub

Bases: object

Addition of 2 double4 (whatever the alignment).

__init__(arch: NVIDIAArch) NoneView on GitHub
build(loads: Collection[InstructionMatch] | None = None) UnorderedInSequenceMatcherView on GitHub
class examples.kokkos.atomic.example_add_double256.Load256Matcher(arch: NVIDIAArch)View on GitHub

Bases: object

__init__(arch: NVIDIAArch) NoneView on GitHub
build() SequenceMatcherView on GitHub
class examples.kokkos.atomic.example_add_double256.Store256Matcher(arch: NVIDIAArch)View on GitHub

Bases: object

__init__(arch: NVIDIAArch) NoneView on GitHub
build() SequenceMatcherView on GitHub
class examples.kokkos.atomic.example_add_double256.TestAtomicAddDouble256View on GitHub

Bases: TestCase

Verify that Kokkos::atomic_add for double4 maps to the desul lock-based array implementation (whatever the alignment).

SIGNATURE_MATCHER: ClassVar[Pattern[str]] = re.compile('AtomicAddFunctor<Kokkos::View<reprospect::examples::kokkos::atomic::Double4Aligned32\\s*\\*\\s*, Kokkos::CudaSpace>>')
classmethod get_target_name() strView on GitHub
test_lock_atomic(decoder: Decoder) NoneView on GitHub

This test proves that it uses the lock-based implementation.

class examples.kokkos.atomic.example_add_int128.AddInt128View on GitHub

Bases: object

Addition of 2 __int128 that uses a specific set of registers.

build(loads: Collection[InstructionMatch] | None = None) AddInt128MatcherView on GitHub
class examples.kokkos.atomic.example_add_int128.TestAtomicAddInt128View on GitHub

Bases: TestCase

Tests for __int128.

SIGNATURE_MATCHER: ClassVar[Pattern[str]] = re.compile('AtomicAddFunctor<Kokkos::View<__int128\\s*\\*\\s*, Kokkos::CudaSpace>>')
classmethod get_target_name() strView on GitHub
test_cas_atomic_as_of_hopper90(decoder: Decoder) NoneView on GitHub

This test proves that it uses the CAS-based implementation.

test_lock_atomic_before_hopper90(decoder: Decoder) NoneView on GitHub

This test proves that it uses the lock-based implementation.