Division

Dividing complex numbers is more subtle than it appears. The naïve formula:

\[\frac{a + bi}{c + di} = \frac{ac + bd + (bc - ad)\,i}{c^2 + d^2}\]

computes \(c^2 + d^2\), which can overflow for large inputs. Underflow regimes exist as well. Many algorithms have therefore been proposed to make complex division more robust [BS12], each with different trade-offs in numerical accuracy and performance.

Some implementations also claim ISO/IEC 60559 compliance [InternationalOfStandardizationInternationalECommission24], which mandates specific handling of edge cases such as infinite or zero operands. The compliance test can be used to validate several implementations against these requirements. Note that libraries such as CCCL expose an opt-out mechanism for this handling, which can save computation cycles when such operands are not expected.

This example focuses on performance using the Newton fractal [con25] as a benchmark, which requires one complex division per iteration per pixel.

../../../_images/fractal.svg

Newton fractal for \(z^4 - 1\) with relaxation factor \(a = 1\).

class examples.kokkos.complex.example_division.Method(*values)View on GitHub

Bases: StrEnum

ILogbScalbn = 'ILogbScalbn'
LogbScalbn = 'LogbScalbn'
NormDivision = 'NormDivision'
__str__()

Return str(self).

class examples.kokkos.complex.example_division.TestDivisionView on GitHub

Bases: CMakeAwareTestCase

KOKKOS_TOOLS_NVTX_CONNECTOR_LIB

Used in TestNCU.report().

classmethod get_target_name() strView on GitHub
class examples.kokkos.complex.example_division.TestNCUView on GitHub

Bases: TestDivision

Kernel profiling.

ELEMENT_COUNT: Final[int] = 256
METRICS: tuple[Metric | MetricCorrelation, ...] = (MetricCounter(name='smsp__inst_executed', pretty_name='smsp__inst_executed', subs=(<MetricCounterRollUp.SUM: 'sum'>,)), MetricCorrelation(name='sass__inst_executed_per_opcode'), MetricCounter(name='sm__inst_executed_pipe_fp64', pretty_name='sm__inst_executed_pipe_fp64', subs=(<MetricCounterRollUp.SUM: 'sum'>,)), MetricCounter(name='sm__pipe_fp64_cycles_active', pretty_name='sm__pipe_fp64_cycles_active', subs=(<MetricCounterRollUp.SUM: 'sum'>,)))
NVTX_INCLUDES: Final[tuple[str, ...]] = ('division',)
WARP_COUNT: Final[int] = 8
WARP_SIZE: Final[int] = 32
metrics(report: Report) dict[Method, ProfilingMetrics]View on GitHub
pytestmark = [Mark(name='skipif', args=(True,), kwargs={'reason': 'needs a GPU'})]
report() ReportView on GitHub
test_fp64_division_instructions(metrics: dict[Method, ProfilingMetrics]) NoneView on GitHub

The number of executed MUFU instructions is equal to the number of divisions performed per work item times WARP_COUNT.

test_fp64_instructions_and_cycles(metrics: dict[Method, ProfilingMetrics]) NoneView on GitHub

In terms of FP64 instructions executed and active pipeline cycles, Method.ILogbScalbn outperforms Method.LogbScalbn, which in turn outperforms Method.NormDivision.

test_fp64_predicate_instructions(metrics: dict[Method, ProfilingMetrics]) NoneView on GitHub

Method.LogbScalbn and Method.ILogbScalbn differ in their DSETP count.

Method.LogbScalbn calls Kokkos::isfinite on the Kokkos::logb result, emitting a DSETP. Method.ILogbScalbn replaces this with an integer comparison, staying on the integer pipeline.

test_i2f_f2i_instructions(metrics: dict[Method, ProfilingMetrics]) NoneView on GitHub

Confirms TestSASS.test_i2f_f2i_instructions().

class examples.kokkos.complex.example_division.TestSASSView on GitHub

Bases: TestDivision

Binary analysis.

SIGNATURE: Final[dict[Method, Pattern[str]]] = {Method.ILogbScalbn: re.compile('reprospect::examples::kokkos::complex::DivisorLogbScalbn<(\\(bool\\)1|true), (\\(bool\\)1|true)>'), Method.LogbScalbn: re.compile('reprospect::examples::kokkos::complex::DivisorLogbScalbn<(\\(bool\\)1|true), (\\(bool\\)0|false)>'), Method.NormDivision: re.compile('reprospect::examples::kokkos::complex::DivisorNormDivision<(\\(bool\\)1|true)>')}
property cubin: PathView on GitHub
cuobjdump() CuObjDumpView on GitHub
decoder(function: dict[Method, Function]) dict[Method, Decoder]View on GitHub
detailed_register_usage(function: dict[Method, Function], nvdisasm: NVDisasm) dict[Method, dict[RegisterType, tuple[int, int]]]View on GitHub
function(cuobjdump: CuObjDump) dict[Method, Function]View on GitHub
nvdisasm(cuobjdump: CuObjDump) NVDisasmView on GitHub
test_detailed_register_usage(detailed_register_usage: dict[Method, dict[RegisterType, tuple[int, int]]]) NoneView on GitHub
test_i2f_f2i_instructions(decoder: dict[Method, Decoder]) NoneView on GitHub

All methods need to execute 4 INT64 to FP64 conversion instructions to convert the clock64() reading to FP64.

Method.LogbScalbn needs additional INT32 to FP64 conversions, likely inside Kokkos::logb, as well as one F2I.F64 instruction for static_cast<int>(logbw).

test_norm_division_uses_more_rcp(decoder: dict[Method, Decoder]) NoneView on GitHub

Division is implemented using a Newton-Raphson-like method. It starts by computing an approximation to the reciprocal of the denominator using the MUFU.RCP64H instruction.

References:

class examples.kokkos.complex.example_division_benchmarking.Method(*values)View on GitHub

Bases: StrEnum

ILogbScalbn = 'ILogbScalbn'
LogbScalbn = 'LogbScalbn'
NormDivision = 'NormDivision'
__str__()

Return str(self).

property is_ilog: boolView on GitHub
property is_norm: boolView on GitHub
class examples.kokkos.complex.example_division_benchmarking.ParametersView on GitHub

Bases: TypedDict

branching_or_compliance: bool
height: int
method: Method
width: int
class examples.kokkos.complex.example_division_benchmarking.TestDivisionView on GitHub

Bases: CMakeAwareTestCase

Run the companion executable and make a nice visualization.

PATTERN: Final[Pattern[str]] = regex.Regex('^NewtonFractal<Divisor(?P<divisor>NormDivision|LogbScalbn)(?:<(?:(?P<branching_or_compliance>true|false))?(?:[, ]*(?P<ilogb>true|false))?>)?>/(?P<full>[A-Za-z]+)/width:(?P<width>\\d+)/height:(?P<height>\\d+)', flags=regex.V0)
TIME_UNIT: Final = 'us'

Time unit of the benchmark.

classmethod get_target_name() strView on GitHub
classmethod params(*, name: str) ParametersView on GitHub

Parse the name of a case and return parameters.

pytestmark = [Mark(name='skipif', args=(True,), kwargs={'reason': 'needs a GPU'})]
raw() dict[str, dict]View on GitHub

Run the benchmark and return the raw JSON-based results.

Warning

Be sure to remove –benchmark_min_time for better converged results.

results(raw: dict) DataFrameView on GitHub

Processed results.

test_visualize(results: DataFrame) NoneView on GitHub

Create a visualization of the results.

class examples.kokkos.complex.example_division_plot.TestDivisionView on GitHub

Bases: CMakeAwareTestCase

EXTENT: Final[tuple[int, ...]] = (-1, 1, -1, 1)
data() tuple[ndarray, ndarray]View on GitHub

Run the executable and read the output arrays.

classmethod get_target_name() strView on GitHub
pytestmark = [Mark(name='skipif', args=(True,), kwargs={'reason': 'needs a GPU'})]
test(data: tuple) NoneView on GitHub

Plot the colors and iterations with some artist touch.