Search CORE

8,440 research outputs found

Parallel Algorithms for Summing Floating-Point Numbers

Author: Eldawy Ahmed
Goodrich Michael T.
Publication venue
Publication date: 17/05/2016
Field of study

The problem of exactly summing n floating-point numbers is a fundamental problem that has many applications in large-scale simulations and computational geometry. Unfortunately, due to the round-off error in standard floating-point operations, this problem becomes very challenging. Moreover, all existing solutions rely on sequential algorithms which cannot scale to the huge datasets that need to be processed. In this paper, we provide several efficient parallel algorithms for summing n floating point numbers, so as to produce a faithfully rounded floating-point representation of the sum. We present algorithms in PRAM, external-memory, and MapReduce models, and we also provide an experimental analysis of our MapReduce algorithms, due to their simplicity and practical efficiency.Comment: Conference version appears in SPAA 201

arXiv.org e-Print Archive

eScholarship - University of California

Reproducibility, accuracy and performance of the Feltor code and library on parallel computer architectures

Author: Einkemmer Lukas
Gutierrez-Milla Albert
Held Markus
Iakymchuk Roman
Saez Xavier
Wiesenberger Matthias
Publication venue: 'Elsevier BV'
Publication date: 03/11/2018
Field of study

Feltor is a modular and free scientific software package. It allows developing platform independent code that runs on a variety of parallel computer architectures ranging from laptop CPUs to multi-GPU distributed memory systems. Feltor consists of both a numerical library and a collection of application codes built on top of the library. Its main target are two- and three-dimensional drift- and gyro-fluid simulations with discontinuous Galerkin methods as the main numerical discretization technique. We observe that numerical simulations of a recently developed gyro-fluid model produce non-deterministic results in parallel computations. First, we show how we restore accuracy and bitwise reproducibility algorithmically and programmatically. In particular, we adopt an implementation of the exactly rounded dot product based on long accumulators, which avoids accuracy losses especially in parallel applications. However, reproducibility and accuracy alone fail to indicate correct simulation behaviour. In fact, in the physical model slightly different initial conditions lead to vastly different end states. This behaviour translates to its numerical representation. Pointwise convergence, even in principle, becomes impossible for long simulation times. In a second part, we explore important performance tuning considerations. We identify latency and memory bandwidth as the main performance indicators of our routines. Based on these, we propose a parallel performance model that predicts the execution time of algorithms implemented in Feltor and test our model on a selection of parallel hardware architectures. We are able to predict the execution time with a relative error of less than 25% for problem sizes between 0.1 and 1000 MB. Finally, we find that the product of latency and bandwidth gives a minimum array size per compute node to achieve a scaling efficiency above 50% (both strong and weak)

arXiv.org e-Print Archive

Online Research Database In Technology

Rigorous numerical approaches in electronic structure theory

Author: Janes Pete Peerapong
Publication venue
Publication date: 21/11/2018
Field of study

Electronic structure theory concerns the description of molecular properties according to the postulates of quantum mechanics. For practical purposes, this is realized entirely through numerical computation, the scope of which is constrained by computational costs that increases rapidly with the size of the system. The significant progress made in this field over the past decades have been facilitated in part by the willingness of chemists to forego some mathematical rigour in exchange for greater efficiency. While such compromises allow large systems to be computed feasibly, there are lingering concerns over the impact that these compromises have on the quality of the results that are produced. This research is motivated by two key issues that contribute to this loss of quality, namely i) the numerical errors accumulated due to the use of finite precision arithmetic and the application of numerical approximations, and ii) the reliance on iterative methods that are not guaranteed to converge to the correct solution. Taking the above issues in consideration, the aim of this thesis is to explore ways to perform electronic structure calculations with greater mathematical rigour, through the application of rigorous numerical methods. Of which, we focus in particular on methods based on interval analysis and deterministic global optimization. The Hartree-Fock electronic structure method will be used as the subject of this study due to its ubiquity within this domain. We outline an approach for placing rigorous bounds on numerical error in Hartree-Fock computations. This is achieved through the application of interval analysis techniques, which are able to rigorously bound and propagate quantities affected by numerical errors. Using this approach, we implement a program called Interval Hartree-Fock. Given a closed-shell system and the current electronic state, this program is able to compute rigorous error bounds on quantities including i) the total energy, ii) molecular orbital energies, iii) molecular orbital coefficients, and iv) derived electronic properties. Interval Hartree-Fock is adapted as an error analysis tool for studying the impact of numerical error in Hartree-Fock computations. It is used to investigate the effect of input related factors such as system size and basis set types on the numerical accuracy of the Hartree-Fock total energy. Consideration is also given to the impact of various algorithm design decisions. Examples include the application of different integral screening thresholds, the variation between single and double precision arithmetic in two-electron integral evaluation, and the adjustment of interpolation table granularity. These factors are relevant to both the usage of conventional Hartree-Fock code, and the development of Hartree-Fock code optimized for novel computing devices such as graphics processing units. We then present an approach for solving the Hartree-Fock equations to within a guaranteed margin of error. This is achieved by treating the Hartree-Fock equations as a non-convex global optimization problem, which is then solved using deterministic global optimization. The main contribution of this work is the development of algorithms for handling quantum chemistry specific expressions such as the one and two-electron integrals within the deterministic global optimization framework. This approach was implemented as an extension to an existing open source solver. Proof of concept calculations are performed for a variety of problems within Hartree-Fock theory, including those in i) point energy calculation, ii) geometry optimization, iii) basis set optimization, and iv) excited state calculation. Performance analyses of these calculations are also presented and discussed

The Australian National University

Recommended from our members

On the block wavelet transform applied to the boundary element method

Author: Bucher HF
Magluta C
Mansur WJ
Wrobel LC
Publication venue: 'Elsevier BV'
Publication date: 01/06/2004
Field of study

This paper follows an earlier work by Bucher et al. [1] on the application of wavelet transforms to the boundary element method, which shows how to reuse models stored in compressed form to solve new models with the same geometry but arbitrary load cases - the so-called virtual assembly technique. The extension presented in this paper involves a new computational procedure created to perform the required two-dimensional wavelet transforms by blocks, theoretically allowing the compression of matrices of arbitrary size. Details of the computer implementation that allows the use of this methodology for very large models or at high compression ratios are given. A numerical application shows a standard PC being used to solve a 131,072 DOF model, previously compressed, for 100 distinct load cases in less than 1 hour – or 33 seconds for each load case

Brunel University Research Archive

Wavelet-Based High-Order Adaptive Modeling of Lossy Interconnects

Author: Canavero Flavio
Grivet-Talocia S.
Publication venue: IEEE
Publication date: 01/01/2001
Field of study

Abstract—This paper presents a numerical-modeling strategy for simulation of fast transients in lossy electrical interconnects. The proposed algorithm makes use of wavelet representations of voltages and currents along the structure, with the aim of reducing the computational complexity of standard time-domain solvers. A special weak procedure for the implementation of possibly dynamic and nonlinear boundary conditions allows to preserve stability as well as a high approximation order, thus leading to very accurate schemes. On the other hand, the wavelet expansion allows the computation of the solution by using few significant coefficients which are automatically determined at each time step. A dynamically refinable mesh is then used to perform a sparse time-stepping. Several numerical results illustrate the high efficiency of the proposed algorithm, which has been tuned and optimized for best performance in fast digital applications typically found on modern PCB structures. Index Terms—Finite difference methods, time-domain analysis, transmission lines, wavelet transforms. I

CiteSeerX

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

A fast multipole method for stellar dynamics

Author: Dehnen Walter
Publication venue
Publication date: 09/05/2014
Field of study

The approximate computation of all gravitational forces between

N

interacting particles via the fast multipole method (FMM) can be made as accurate as direct summation, but requires less than

\mathcal{O}(N)

operations. FMM groups particles into spatially bounded cells and uses cell-cell interactions to approximate the force at any position within the sink cell by a Taylor expansion obtained from the multipole expansion of the source cell. By employing a novel estimate for the errors incurred in this process, I minimise the computational effort required for a given accuracy and obtain a well-behaved distribution of force errors. For relative force errors of

\sim10^{-7}

, the computational costs exhibit an empirical scaling of

\propto N^{0.87}

. My implementation (running on a 16 core node) out-performs a GPU-based direct summation with comparable force errors for

N\gtrsim10^5

.Comment: 21 pages, 15 figures, accepted for publication in Journal for Computational Astrophysics and Cosmolog

arXiv.org e-Print Archive

Springer - Publisher Connector

Leicester Research Archive