Search CORE

65 research outputs found

Parallel Algorithms for Summing Floating-Point Numbers

Author: Eldawy Ahmed
Goodrich Michael T.
Publication venue
Publication date: 17/05/2016
Field of study

The problem of exactly summing n floating-point numbers is a fundamental problem that has many applications in large-scale simulations and computational geometry. Unfortunately, due to the round-off error in standard floating-point operations, this problem becomes very challenging. Moreover, all existing solutions rely on sequential algorithms which cannot scale to the huge datasets that need to be processed. In this paper, we provide several efficient parallel algorithms for summing n floating point numbers, so as to produce a faithfully rounded floating-point representation of the sum. We present algorithms in PRAM, external-memory, and MapReduce models, and we also provide an experimental analysis of our MapReduce algorithms, due to their simplicity and practical efficiency.Comment: Conference version appears in SPAA 201

arXiv.org e-Print Archive

eScholarship - University of California

BLAS オモチイタコウセイドナギョウレツセキアルゴリズムノシヨウメモリリョウノサクゲントソノセイノウニツイテカガクギジュツケイサンニオケルリロントオウヨウノシンテンカイ

Author: 尾崎克久
荻田武史
Publication venue: 京都大学数理解析研究所
Publication date: 01/04/2012
Field of study

Kyoto University Research Information Repository

Streaming Reduction Circuit

Author: Gerards Marco
Kokkeler André
Kuper Jan
Molenkamp Bert
Publication venue: IEEE Computer Society Press
Publication date: 01/01/2009
Field of study

Reduction circuits are used to reduce rows of ﬂoating point values to single values. Binary ﬂoating point operators often have deep pipelines, which may cause hazards when many consecutive rows have to be reduced. We present an algorithm by which any number of consecutive rows of arbitrary lengths can be reduced by a pipelined commutative and associative binary operator in an efficient manner. The algorithm is simple to implement, has a low latency, produces results in-order, and requires only small buffers. Besides, it uses only a single pipeline for the involved operation. The complexity of the algorithm depends on the depth of the pipeline, not on the length of the input rows. In this paper we discuss an implementation of this algorithm and we prove its correctness

University of Twente Research Information

Compensated evaluation of tensor product surfaces in CAGD

Author: Delgado Gracia J.
Publication venue: 'MDPI AG'
Publication date: 01/01/2020
Field of study

In computer-aided geometric design, a polynomial surface is usually represented in Bézier form. The usual form of evaluating such a surface is by using an extension of the de Casteljau algorithm. Using error-free transformations, a compensated version of this algorithm is presented, which improves the usual algorithm in terms of accuracy. A forward error analysis illustrating this fact is developed

Multidisciplinary Digital Publishing Institute

Repositorio Universidad de Zaragoza

Minimizing synchronizations in sparse iterative solvers for distributed supercomputers

Author: Gu T.-X.
Liu X.-P.
Zhu S.-X.
Publication venue
Publication date: 01/01/2013
Field of study

Eliminating synchronizations is one of the important techniques related to minimizing communications for modern high performance computing. This paper discusses principles of reducing communications due to global synchronizations in sparse iterative solvers on distributed supercomputers. We demonstrates how to minimizing global synchronizations by rescheduling a typical Krylov subspace method. The benefit of minimizing synchronizations is shown in theoretical analysis and is verified by numerical experiments using up to 900 processors. The experiments also show the communication complexity for some structured sparse matrix vector multiplications and global communications in the underlying supercomputers are in the order P1/2.5 and P4/5 respectively, where P is the number of processors and the experiments were carried on a Dawning 5000A

Oxford University Research Archive