{ "cells": [ { "cell_type": "markdown", "id": "0", "metadata": {}, "source": [ "How do atomic operations translate from CUDA `C` to `SASS` ?\n", "=======================================================\n", "\n", "This example demonstrates how `ReProspect` can be used to quickly analyse how a given `C` operation is translated into its corresponding `SASS` instructions.\n", "\n", "This approach offers several advantages:\n", "- **Pedagogical** — useful for teaching and learning\n", "- **Comparative** — helps understand how the mapping from `C` to `SASS` evolves across architectures" ] }, { "cell_type": "markdown", "id": "1", "metadata": {}, "source": [ "Source code\n", "-----------\n", "\n", "Let's take a closer look at `atomicAdd`, described [here](https://docs.nvidia.com/cuda/cuda-c-programming-guide/#atomicadd).\n", "The kernel atomically adds a source array to a destination array, element-wise." ] }, { "cell_type": "code", "execution_count": null, "id": "2", "metadata": {}, "outputs": [], "source": [ "CODE = \"\"\"\\\n", "#include \"cuda.h\"\n", "\n", "__global__ void my_kernel(int * __restrict__ const dst, const int* __restrict__ const src) {\n", " const auto index = blockIdx.x * blockDim.x + threadIdx.x;\n", " atomicAdd(&dst[index], src[index]);\n", "}\n", "\"\"\"" ] }, { "cell_type": "markdown", "id": "3", "metadata": {}, "source": [ "Compilation\n", "-----------\n", "\n", "Compile the source code for a few architectures of your choice.\n", "As this analysis won't actually run the executable, any architecture supported by ``nvcc`` should work." ] }, { "cell_type": "code", "execution_count": null, "id": "4", "metadata": {}, "outputs": [], "source": [ "import logging\n", "import pathlib\n", "import subprocess\n", "import tempfile\n", "\n", "from reprospect.tools import architecture\n", "\n", "logging.basicConfig(level=logging.INFO)\n", "\n", "ARCHES = [\n", " architecture.NVIDIAArch.from_compute_capability(80),\n", " architecture.NVIDIAArch.from_compute_capability(90),\n", " architecture.NVIDIAArch.from_compute_capability(120),\n", "]\n", "\n", "logging.info(subprocess.check_output(('nvcc', '--version')).decode())\n", "\n", "with tempfile.TemporaryDirectory(delete=False) as tmpdir:\n", " for arch in ARCHES:\n", " source = pathlib.Path(tmpdir) / f'atomic.{arch.as_sm}.cu'\n", " output = pathlib.Path(tmpdir) / f'atomic.{arch.as_sm}'\n", "\n", " source.write_text(CODE)\n", "\n", " subprocess.check_call(('nvcc', f'--generate-code=arch={arch.as_compute},code=[{arch.as_sm}]', '-O3', '-c', source, '-o', output))" ] }, { "cell_type": "markdown", "id": "5", "metadata": {}, "source": [ "Binary analysis\n", "---------------\n", "\n", "Let's inspect the generate `SASS` instructions." ] }, { "cell_type": "code", "execution_count": null, "id": "6", "metadata": {}, "outputs": [], "source": [ "from reprospect.tools import binaries\n", "\n", "cuobjdump = {}\n", "\n", "for arch in ARCHES:\n", " cuobjdump[arch], _ = binaries.CuObjDump.extract(\n", " file=pathlib.Path(tmpdir) / f'atomic.{arch.as_sm}',\n", " arch=arch,\n", " cwd=pathlib.Path(tmpdir),\n", " cubin=f'atomic.1.{arch.as_sm}.cubin',\n", " )" ] }, { "cell_type": "code", "execution_count": null, "id": "7", "metadata": { "tags": [ "hide-output" ] }, "outputs": [], "source": [ "from reprospect.tools.sass import Decoder\n", "\n", "SIGNATURE = 'my_kernel(int *, const int *)'\n", "\n", "for arch in ARCHES:\n", " logging.info(Decoder(code=cuobjdump[arch].functions[SIGNATURE].code))" ] }, { "cell_type": "markdown", "id": "8", "metadata": {}, "source": [ "For the kernel defined above, the atomic operation is always translated into a *Reduction Operation on Generic Memory* on the target architectures.\n", "\n", "However, the exact encoding varies with compute capability:\n", "- On architectures **prior to** ``Hopper``, it appears as ``RED``.\n", "- On architectures **starting from** ``Hopper``, it appears as ``REDG``." ] }, { "cell_type": "code", "execution_count": null, "id": "9", "metadata": {}, "outputs": [], "source": [ "for arch, dump in cuobjdump.items():\n", " if arch.compute_capability < 90:\n", " assert 'RED.E.ADD.STRONG.GPU' in dump.functions[SIGNATURE].code\n", " else:\n", " assert 'REDG.E.ADD.STRONG.GPU' in dump.functions[SIGNATURE].code" ] } ], "metadata": { "kernelspec": { "display_name": "venv-3.12", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.3" } }, "nbformat": 4, "nbformat_minor": 5 }