Division
Dividing complex numbers is more subtle than it appears. The naïve formula:
computes \(c^2 + d^2\), which can overflow for large inputs. Underflow regimes exist as well. Many algorithms have therefore been proposed to make complex division more robust [BS12], each with different trade-offs in numerical accuracy and performance. These algorithms typically use a scaling strategy. A further challenge in complex division is the handling of edge cases such as infinite or zero operands.
The standard ISO/IEC 9899 (Annex G) [InternationalOfStandardizationInternationalECommission24] raises the issue of overflow
and underflow and specifies the expected outcomes for the edge cases to comply with IEC 60599 real floating-point arithmetic.
It also suggests an implementation that adopts a scaling approach using the scalbn function
and relies on branching to handle the edge cases.
Compilers such as LLVM
and libraries such as CCCL
follow this suggested implementation.
They also expose mechanisms to disable the branching,
which can save computation cycles when such operands are not expected.
The current implementation of complex division in Kokkos
does not follow the ISO/IEC 9899 (Annex G) implementation.
Instead, it adopts a scaling approach using a division, and it only partially
complies with the handling of the edge cases.
Their implementation thus entails more divisions than the ISO/IEC 9899 (Annex G) implementation, see
examples.kokkos.complex.example_division.TestSASS.test_norm_division_uses_more_rcp().
This example also explores a potential improvement of the
ISO/IEC 9899 (Annex G) implementation that replaces the use of logb followed by a cast to int
with a use of ilogb.
This change decreases the number of fp64 instructions and cycles, see
examples.kokkos.complex.example_division.TestNCU.test_fp64_instructions_and_cycles().
This example thus compares three implementations:
Both examples.kokkos.complex.example_division.Method.LogbScalbn
and examples.kokkos.complex.example_division.Method.ILogbScalbn use a common C++ implementation templated
on a flag that allows the handling of edge cases to be disabled
and another flag that controls the use of ilogb.
This example has three parts:
A
compliance testthat checks whether each implementation complies with the expected outcomes for the edge cases.A binary analysis in
examples.kokkos.complex.example_division.TestSASS()and a kernel profiling inexamples.kokkos.complex.example_division.TestNCU().A benchmark in
examples.kokkos.complex.example_division_benchmarking.TestDivision().
The benchmark compares the performance of the three implementations using the Newton fractal [con25], which requires one complex division per iteration per pixel:
Newton fractal for \(z^4 - 1\) with relaxation factor \(a = 1\).
- class examples.kokkos.complex.example_division.Method(*values)View on GitHub
Bases:
StrEnum- ILogbScalbn = 'ILogbScalbn'
Same as
Method.LogbScalbn, but using theilogbfunction.
- LogbScalbn = 'LogbScalbn'
Implementation that uses a scaling with the
scalbnfunction as suggested in [InternationalOfStandardizationInternationalECommission24].
- NormDivision = 'NormDivision'
Implementation that uses the division-based scaling currently implemented in Kokkos.
- __str__()
Return str(self).
- class examples.kokkos.complex.example_division.TestDivisionView on GitHub
Bases:
CMakeAwareTestCase- KOKKOS_TOOLS_NVTX_CONNECTOR_LIB
Used in
TestNCU.report().
- classmethod get_target_name() strView on GitHub
- class examples.kokkos.complex.example_division.TestNCUView on GitHub
Bases:
TestDivisionKernel profiling.
- METRICS: tuple[Metric | MetricCorrelation, ...] = (MetricCounter(name='smsp__inst_executed', pretty_name='smsp__inst_executed', subs=(<MetricCounterRollUp.SUM: 'sum'>,)), MetricCorrelation(name='sass__inst_executed_per_opcode'), MetricCounter(name='sm__inst_executed_pipe_fp64', pretty_name='sm__inst_executed_pipe_fp64', subs=(<MetricCounterRollUp.SUM: 'sum'>,)), MetricCounter(name='sm__pipe_fp64_cycles_active', pretty_name='sm__pipe_fp64_cycles_active', subs=(<MetricCounterRollUp.SUM: 'sum'>,)))
- metrics(report: Report) dict[Method, ProfilingMetrics]View on GitHub
- pytestmark = [Mark(name='skipif', args=(True,), kwargs={'reason': 'needs a GPU'})]
- report() ReportView on GitHub
- test_fp64_division_instructions(metrics: dict[Method, ProfilingMetrics]) NoneView on GitHub
The number of executed MUFU instructions is equal to the number of divisions performed per work item times
WARP_COUNT.
- test_fp64_instructions_and_cycles(metrics: dict[Method, ProfilingMetrics]) NoneView on GitHub
In terms of FP64 instructions executed and active pipeline cycles,
Method.ILogbScalbnoutperformsMethod.LogbScalbn, which in turn outperformsMethod.NormDivision.
- test_fp64_predicate_instructions(metrics: dict[Method, ProfilingMetrics]) NoneView on GitHub
Method.LogbScalbnandMethod.ILogbScalbndiffer in their DSETP count.Method.LogbScalbncallsKokkos::isfiniteon theKokkos::logbresult, emitting a DSETP.Method.ILogbScalbnreplaces this with an integer comparison, staying on the integer pipeline.
- test_i2f_f2i_instructions(metrics: dict[Method, ProfilingMetrics]) NoneView on GitHub
Confirms
TestSASS.test_i2f_f2i_instructions().
- class examples.kokkos.complex.example_division.TestSASSView on GitHub
Bases:
TestDivisionBinary analysis.
- SIGNATURE: Final[dict[Method, Pattern[str]]] = {Method.ILogbScalbn: re.compile('reprospect::examples::kokkos::complex::DivisorLogbScalbn<(\\(bool\\)1|true), (\\(bool\\)1|true)>'), Method.LogbScalbn: re.compile('reprospect::examples::kokkos::complex::DivisorLogbScalbn<(\\(bool\\)1|true), (\\(bool\\)0|false)>'), Method.NormDivision: re.compile('reprospect::examples::kokkos::complex::DivisorNormDivision<(\\(bool\\)1|true)>')}
- property cubin: PathView on GitHub
- cuobjdump() CuObjDumpView on GitHub
- detailed_register_usage(function: dict[Method, Function], nvdisasm: NVDisasm) dict[Method, dict[RegisterType, tuple[int, int]]]View on GitHub
- nvdisasm(cuobjdump: CuObjDump) NVDisasmView on GitHub
- test_detailed_register_usage(detailed_register_usage: dict[Method, dict[RegisterType, tuple[int, int]]]) NoneView on GitHub
- test_i2f_f2i_instructions(decoder: dict[Method, Decoder]) NoneView on GitHub
All methods need to execute 4 INT64 to FP64 conversion instructions to convert the
clock64()reading to FP64.Method.LogbScalbnneeds additional INT32 to FP64 conversions, likely insideKokkos::logb, as well as one F2I.F64 instruction forstatic_cast<int>(logbw).
- class examples.kokkos.complex.example_division_benchmarking.Method(*values)View on GitHub
Bases:
StrEnum- ILogbScalbn = 'ILogbScalbn'
- LogbScalbn = 'LogbScalbn'
- NormDivision = 'NormDivision'
- __str__()
Return str(self).
- property is_ilog: boolView on GitHub
- property is_norm: boolView on GitHub
- class examples.kokkos.complex.example_division_benchmarking.ParametersView on GitHub
Bases:
TypedDict
- class examples.kokkos.complex.example_division_benchmarking.TestDivisionView on GitHub
Bases:
CMakeAwareTestCaseRun the companion executable and make a nice visualization.
- PATTERN: Final[Pattern[str]] = regex.Regex('^NewtonFractal<Divisor(?P<divisor>NormDivision|LogbScalbn)(?:<(?:(?P<branching_or_compliance>true|false))?(?:[, ]*(?P<ilogb>true|false))?>)?>/(?P<full>[A-Za-z]+)/width:(?P<width>\\d+)/height:(?P<height>\\d+)', flags=regex.V0)
- classmethod get_target_name() strView on GitHub
- classmethod params(*, name: str) ParametersView on GitHub
Parse the name of a case and return parameters.
- pytestmark = [Mark(name='skipif', args=(True,), kwargs={'reason': 'needs a GPU'})]
- raw() dict[str, dict]View on GitHub
Run the benchmark and return the raw JSON-based results.
Warning
Be sure to remove –benchmark_min_time for better converged results.
- results(raw: dict) DataFrameView on GitHub
Processed results.
- test_visualize(results: DataFrame) NoneView on GitHub
Create a visualization of the results.
- class examples.kokkos.complex.example_division_plot.TestDivisionView on GitHub
Bases:
CMakeAwareTestCase- data() tuple[ndarray, ndarray]View on GitHub
Run the executable and read the output arrays.
- classmethod get_target_name() strView on GitHub
- pytestmark = [Mark(name='skipif', args=(True,), kwargs={'reason': 'needs a GPU'})]
- test(data: tuple) NoneView on GitHub
Plot the colors and iterations with some artist touch.