Division
Dividing complex numbers is more subtle than it appears. The naïve formula:
computes \(c^2 + d^2\), which can overflow for large inputs. Underflow regimes exist as well. Many algorithms have therefore been proposed to make complex division more robust [BS12], each with different trade-offs in numerical accuracy and performance.
Some implementations also claim ISO/IEC 60559 compliance [InternationalOfStandardizationInternationalECommission24],
which mandates specific handling of edge cases such as infinite or zero operands.
The compliance test
can be used to validate several implementations against these requirements.
Note that libraries such as CCCL
expose an opt-out mechanism for this handling, which can save computation cycles when such operands are not expected.
This example focuses on performance using the Newton fractal [con25] as a benchmark, which requires one complex division per iteration per pixel.
Newton fractal for \(z^4 - 1\) with relaxation factor \(a = 1\).
- class examples.kokkos.complex.example_division.Method(*values)View on GitHub
Bases:
StrEnum- ILogbScalbn = 'ILogbScalbn'
- LogbScalbn = 'LogbScalbn'
- NormDivision = 'NormDivision'
- __str__()
Return str(self).
- class examples.kokkos.complex.example_division.TestDivisionView on GitHub
Bases:
CMakeAwareTestCase- KOKKOS_TOOLS_NVTX_CONNECTOR_LIB
Used in
TestNCU.report().
- classmethod get_target_name() strView on GitHub
- class examples.kokkos.complex.example_division.TestNCUView on GitHub
Bases:
TestDivisionKernel profiling.
- METRICS: tuple[Metric | MetricCorrelation, ...] = (MetricCounter(name='smsp__inst_executed', pretty_name='smsp__inst_executed', subs=(<MetricCounterRollUp.SUM: 'sum'>,)), MetricCorrelation(name='sass__inst_executed_per_opcode'), MetricCounter(name='sm__inst_executed_pipe_fp64', pretty_name='sm__inst_executed_pipe_fp64', subs=(<MetricCounterRollUp.SUM: 'sum'>,)), MetricCounter(name='sm__pipe_fp64_cycles_active', pretty_name='sm__pipe_fp64_cycles_active', subs=(<MetricCounterRollUp.SUM: 'sum'>,)))
- metrics(report: Report) dict[Method, ProfilingMetrics]View on GitHub
- pytestmark = [Mark(name='skipif', args=(True,), kwargs={'reason': 'needs a GPU'})]
- report() ReportView on GitHub
- test_fp64_division_instructions(metrics: dict[Method, ProfilingMetrics]) NoneView on GitHub
The number of executed MUFU instructions is equal to the number of divisions performed per work item times
WARP_COUNT.
- test_fp64_instructions_and_cycles(metrics: dict[Method, ProfilingMetrics]) NoneView on GitHub
In terms of FP64 instructions executed and active pipeline cycles,
Method.ILogbScalbnoutperformsMethod.LogbScalbn, which in turn outperformsMethod.NormDivision.
- test_fp64_predicate_instructions(metrics: dict[Method, ProfilingMetrics]) NoneView on GitHub
Method.LogbScalbnandMethod.ILogbScalbndiffer in their DSETP count.Method.LogbScalbncallsKokkos::isfiniteon theKokkos::logbresult, emitting a DSETP.Method.ILogbScalbnreplaces this with an integer comparison, staying on the integer pipeline.
- test_i2f_f2i_instructions(metrics: dict[Method, ProfilingMetrics]) NoneView on GitHub
Confirms
TestSASS.test_i2f_f2i_instructions().
- class examples.kokkos.complex.example_division.TestSASSView on GitHub
Bases:
TestDivisionBinary analysis.
- SIGNATURE: Final[dict[Method, Pattern[str]]] = {Method.ILogbScalbn: re.compile('reprospect::examples::kokkos::complex::DivisorLogbScalbn<(\\(bool\\)1|true), (\\(bool\\)1|true)>'), Method.LogbScalbn: re.compile('reprospect::examples::kokkos::complex::DivisorLogbScalbn<(\\(bool\\)1|true), (\\(bool\\)0|false)>'), Method.NormDivision: re.compile('reprospect::examples::kokkos::complex::DivisorNormDivision<(\\(bool\\)1|true)>')}
- property cubin: PathView on GitHub
- cuobjdump() CuObjDumpView on GitHub
- detailed_register_usage(function: dict[Method, Function], nvdisasm: NVDisasm) dict[Method, dict[RegisterType, tuple[int, int]]]View on GitHub
- nvdisasm(cuobjdump: CuObjDump) NVDisasmView on GitHub
- test_detailed_register_usage(detailed_register_usage: dict[Method, dict[RegisterType, tuple[int, int]]]) NoneView on GitHub
- test_i2f_f2i_instructions(decoder: dict[Method, Decoder]) NoneView on GitHub
All methods need to execute 4 INT64 to FP64 conversion instructions to convert the
clock64()reading to FP64.Method.LogbScalbnneeds additional INT32 to FP64 conversions, likely insideKokkos::logb, as well as one F2I.F64 instruction forstatic_cast<int>(logbw).
- class examples.kokkos.complex.example_division_benchmarking.Method(*values)View on GitHub
Bases:
StrEnum- ILogbScalbn = 'ILogbScalbn'
- LogbScalbn = 'LogbScalbn'
- NormDivision = 'NormDivision'
- __str__()
Return str(self).
- property is_ilog: boolView on GitHub
- property is_norm: boolView on GitHub
- class examples.kokkos.complex.example_division_benchmarking.ParametersView on GitHub
Bases:
TypedDict
- class examples.kokkos.complex.example_division_benchmarking.TestDivisionView on GitHub
Bases:
CMakeAwareTestCaseRun the companion executable and make a nice visualization.
- PATTERN: Final[Pattern[str]] = regex.Regex('^NewtonFractal<Divisor(?P<divisor>NormDivision|LogbScalbn)(?:<(?:(?P<branching_or_compliance>true|false))?(?:[, ]*(?P<ilogb>true|false))?>)?>/(?P<full>[A-Za-z]+)/width:(?P<width>\\d+)/height:(?P<height>\\d+)', flags=regex.V0)
- classmethod get_target_name() strView on GitHub
- classmethod params(*, name: str) ParametersView on GitHub
Parse the name of a case and return parameters.
- pytestmark = [Mark(name='skipif', args=(True,), kwargs={'reason': 'needs a GPU'})]
- raw() dict[str, dict]View on GitHub
Run the benchmark and return the raw JSON-based results.
Warning
Be sure to remove –benchmark_min_time for better converged results.
- results(raw: dict) DataFrameView on GitHub
Processed results.
- test_visualize(results: DataFrame) NoneView on GitHub
Create a visualization of the results.
- class examples.kokkos.complex.example_division_plot.TestDivisionView on GitHub
Bases:
CMakeAwareTestCase- data() tuple[ndarray, ndarray]View on GitHub
Run the executable and read the output arrays.
- classmethod get_target_name() strView on GitHub
- pytestmark = [Mark(name='skipif', args=(True,), kwargs={'reason': 'needs a GPU'})]
- test(data: tuple) NoneView on GitHub
Plot the colors and iterations with some artist touch.