8,440 research outputs found
Parallel Algorithms for Summing Floating-Point Numbers
The problem of exactly summing n floating-point numbers is a fundamental
problem that has many applications in large-scale simulations and computational
geometry. Unfortunately, due to the round-off error in standard floating-point
operations, this problem becomes very challenging. Moreover, all existing
solutions rely on sequential algorithms which cannot scale to the huge datasets
that need to be processed.
In this paper, we provide several efficient parallel algorithms for summing n
floating point numbers, so as to produce a faithfully rounded floating-point
representation of the sum. We present algorithms in PRAM, external-memory, and
MapReduce models, and we also provide an experimental analysis of our MapReduce
algorithms, due to their simplicity and practical efficiency.Comment: Conference version appears in SPAA 201
Reproducibility, accuracy and performance of the Feltor code and library on parallel computer architectures
Feltor is a modular and free scientific software package. It allows
developing platform independent code that runs on a variety of parallel
computer architectures ranging from laptop CPUs to multi-GPU distributed memory
systems. Feltor consists of both a numerical library and a collection of
application codes built on top of the library. Its main target are two- and
three-dimensional drift- and gyro-fluid simulations with discontinuous Galerkin
methods as the main numerical discretization technique. We observe that
numerical simulations of a recently developed gyro-fluid model produce
non-deterministic results in parallel computations. First, we show how we
restore accuracy and bitwise reproducibility algorithmically and
programmatically. In particular, we adopt an implementation of the exactly
rounded dot product based on long accumulators, which avoids accuracy losses
especially in parallel applications. However, reproducibility and accuracy
alone fail to indicate correct simulation behaviour. In fact, in the physical
model slightly different initial conditions lead to vastly different end
states. This behaviour translates to its numerical representation. Pointwise
convergence, even in principle, becomes impossible for long simulation times.
In a second part, we explore important performance tuning considerations. We
identify latency and memory bandwidth as the main performance indicators of our
routines. Based on these, we propose a parallel performance model that predicts
the execution time of algorithms implemented in Feltor and test our model on a
selection of parallel hardware architectures. We are able to predict the
execution time with a relative error of less than 25% for problem sizes between
0.1 and 1000 MB. Finally, we find that the product of latency and bandwidth
gives a minimum array size per compute node to achieve a scaling efficiency
above 50% (both strong and weak)
Rigorous numerical approaches in electronic structure theory
Electronic structure theory concerns the description of molecular properties according to the postulates of quantum mechanics. For practical purposes, this is realized entirely through numerical computation, the scope of which is constrained by computational costs that increases rapidly with the size of the system.
The significant progress made in this field over the past decades have been facilitated in part by the willingness of chemists to forego some mathematical rigour in exchange for greater efficiency. While such compromises allow large systems to be computed feasibly, there are lingering concerns over the impact that these compromises have on the quality of the results that are produced. This research is motivated by two key issues that contribute to this loss of quality, namely i) the numerical errors accumulated due to the use of finite precision arithmetic and the application of numerical approximations, and ii) the reliance on iterative methods that are not guaranteed to converge to the correct solution.
Taking the above issues in consideration, the aim of this thesis is to explore ways to perform electronic structure calculations with greater mathematical rigour, through the application of rigorous numerical methods. Of which, we focus in particular on methods based on interval analysis and deterministic global optimization. The Hartree-Fock electronic structure method will be used as the subject of this study due to its ubiquity within this domain.
We outline an approach for placing rigorous bounds on numerical error in Hartree-Fock computations. This is achieved through the application of interval analysis techniques, which are able to rigorously bound and propagate quantities affected by numerical errors. Using this approach, we implement a program called Interval Hartree-Fock. Given a closed-shell system and the current electronic state, this program is able to compute rigorous error bounds on quantities including i) the total energy, ii) molecular orbital energies, iii) molecular orbital coefficients, and iv) derived electronic properties.
Interval Hartree-Fock is adapted as an error analysis tool for studying the impact of numerical error in Hartree-Fock computations. It is used to investigate the effect of input related factors such as system size and basis set types on the numerical accuracy of the Hartree-Fock total energy. Consideration is also given to the impact of various algorithm design decisions. Examples include the application of different integral screening thresholds, the variation between single and double precision arithmetic in two-electron integral evaluation, and the adjustment of interpolation table granularity. These factors are relevant to both the usage of conventional Hartree-Fock code, and the development of Hartree-Fock code optimized for novel computing devices such as graphics processing units.
We then present an approach for solving the Hartree-Fock equations to within a guaranteed margin of error. This is achieved by treating the Hartree-Fock equations as a non-convex global optimization problem, which is then solved using deterministic global optimization. The main contribution of this work is the development of algorithms for handling quantum chemistry specific expressions such as the one and two-electron integrals within the deterministic global optimization framework. This approach was implemented as an extension to an existing open source solver.
Proof of concept calculations are performed for a variety of problems within Hartree-Fock theory, including those in i) point energy calculation, ii) geometry optimization, iii) basis set optimization, and iv) excited state calculation. Performance analyses of these calculations are also presented and discussed
Recommended from our members
On the block wavelet transform applied to the boundary element method
This paper follows an earlier work by Bucher et al. [1] on the application of wavelet transforms to the boundary element method, which shows how to reuse models stored in compressed form to solve new models with the same geometry but arbitrary load cases - the so-called virtual assembly technique. The extension presented in this paper involves a new computational procedure created to perform the required two-dimensional wavelet transforms by blocks, theoretically allowing the compression of matrices of arbitrary size. Details of the computer implementation that allows the use of this methodology for very large models or at high compression ratios are given. A numerical application shows a standard PC being used to solve a 131,072 DOF model, previously compressed, for 100 distinct load cases in less than 1 hour – or 33 seconds for each load case
Wavelet-Based High-Order Adaptive Modeling of Lossy Interconnects
Abstract—This paper presents a numerical-modeling strategy for simulation of fast transients in lossy electrical interconnects. The proposed algorithm makes use of wavelet representations of voltages and currents along the structure, with the aim of reducing the computational complexity of standard time-domain solvers. A special weak procedure for the implementation of possibly dynamic and nonlinear boundary conditions allows to preserve stability as well as a high approximation order, thus leading to very accurate schemes. On the other hand, the wavelet expansion allows the computation of the solution by using few significant coefficients which are automatically determined at each time step. A dynamically refinable mesh is then used to perform a sparse time-stepping. Several numerical results illustrate the high efficiency of the proposed algorithm, which has been tuned and optimized for best performance in fast digital applications typically found on modern PCB structures. Index Terms—Finite difference methods, time-domain analysis, transmission lines, wavelet transforms. I
A fast multipole method for stellar dynamics
The approximate computation of all gravitational forces between
interacting particles via the fast multipole method (FMM) can be made as
accurate as direct summation, but requires less than
operations. FMM groups particles into spatially bounded cells and uses
cell-cell interactions to approximate the force at any position within the sink
cell by a Taylor expansion obtained from the multipole expansion of the source
cell. By employing a novel estimate for the errors incurred in this process, I
minimise the computational effort required for a given accuracy and obtain a
well-behaved distribution of force errors. For relative force errors of
, the computational costs exhibit an empirical scaling of . My implementation (running on a 16 core node) out-performs a
GPU-based direct summation with comparable force errors for .Comment: 21 pages, 15 figures, accepted for publication in Journal for
Computational Astrophysics and Cosmolog
- …