Search CORE

6 research outputs found

Parallel Algorithms for Summing Floating-Point Numbers

Author: Eldawy Ahmed
Goodrich Michael T.
Publication venue
Publication date: 17/05/2016
Field of study

The problem of exactly summing n floating-point numbers is a fundamental problem that has many applications in large-scale simulations and computational geometry. Unfortunately, due to the round-off error in standard floating-point operations, this problem becomes very challenging. Moreover, all existing solutions rely on sequential algorithms which cannot scale to the huge datasets that need to be processed. In this paper, we provide several efficient parallel algorithms for summing n floating point numbers, so as to produce a faithfully rounded floating-point representation of the sum. We present algorithms in PRAM, external-memory, and MapReduce models, and we also provide an experimental analysis of our MapReduce algorithms, due to their simplicity and practical efficiency.Comment: Conference version appears in SPAA 201

arXiv.org e-Print Archive

eScholarship - University of California

A Reproducible Accurate Summation Algorithm for High-Performance Computing

Author: Collange Sylvain
Defour David
Graillat Stef
Iakymchuk Roman
Publication venue: HAL CCSD
Publication date: 01/07/2014
Field of study

International audienceFloating-point (FP) addition is non-associative and parallel reduction involving this operation is a serious issue as noted in the DARPA Exascale Report [1]. Such large summations typically appear within fundamental numerical blocks such as dot products or numerical integrations. Hence, the result may vary from one parallel machine to another or even from one run to another. These discrepancies worsen on heterogeneous architectures – such as clusters with GPUs or Intel Xeon Phi processors – which combine programming environments that may obey various floating-point models and offer different intermediate precision or different operators [2,3]. Such non-determinism of floating-point calculations in parallel programs causes validation and debugging issues, and may lead to deadlocks [4]. The increasing power of current computers enables one to solve more and more complex problems. That, consequently, leads to a higher number of floating-point operations to be performed; each of them potentially causing a round-off error. Because of the round-off error propagation, some problems must be solved with a wider floating-point format. Two approaches exist to perform floating-point addition without incurring round-off errors. The first approach aims at computing the error that is occurred during rounding using FP expansions, which are based on an error-free transformation. FP expansions represent the result as an unevaluated sum of a fixed number of FP numbers, whose components are ordered in magnitude with minimal overlap to cover a wide range of exponents. FP expansions of sizes 2 and 4 are presented in [5] and [6], accordingly. The main advantage of this solution is that the expansion can stay in registers during the computations. But, the accuracy is insufficient for the summation of numerous FP numbers or sums with a huge dynamic range. Moreover, their complexity grows linearly with the size of the expansion. An alternative approach to expansions exploits the finite range of representable floating-point numbers by storing every bit in a very long vector of bits (accumulator). The length of the accumulator is chosen such that every bit of information of the input format can be represented; this covers the range from the minimum representable floating-point value to the maximum value, independently of the sign. For instance, Kulisch [7] proposed to utilize an accumulator of 4288 bits to handle the accumulation of products of 64-bit IEEE floating-point values. The Kulisch accumulator is a solution to produce the exact result of a very large amount of floating-point numbers of arbitrary magnitude. However, for a long period this approach was considered impractical as it induces a very large memory overhead. Furthermore, without dedicated hardware support, its performance is limited by indirect memory accesses that make vectorization challenging. We aim at addressing both issues of accuracy and reproducibility in the context of summation. We advocate to compute the correctly-rounded result of the exact sum. Besides offering strict reproducibility through an unambiguous definition of the expected result, our approach guarantees that the result ha

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

Reproducible floating-point atomic addition in data-parallel environment

Author
Publication venue: 'Polish Information Processing Society PTI'
Publication date
Field of study

Crossref

Accurate and Efficiently Vectorized Sums and Dot Products in Julia

Author: Elrod Chris
Févotte François
Publication venue: HAL CCSD
Publication date: 21/08/2019
Field of study

Version submitted to the Correctness2019 workshopThis paper presents an efficient, vectorized implementation of various summation and dot product algorithms in the Julia programming language. These implementations are available under an open source license in the AccurateArithmetic.jl Julia package.Besides naive algorithms, compensated algorithms are implemented: the Kahan-Babuška-Neumaier summation algorithm, and the Ogita-Rump-Oishi simply compensated summation and dot product algorithms. These algorithms effectively double the working precision, producing much more accurate results while incurring little to no overhead, especially for large input vectors.This paper also tries and builds upon this example to make a case for a more widespread use of Julia in the HPC community. Although the vectorization of compensated algorithms is no particularly simple task, Julia makes it relatively easy and straightforward. It also allows structuring the code in small, composable building blocks, closely matching textbook algorithms yet efficiently compiled