459 research outputs found
A bibliography on parallel and vector numerical algorithms
This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also
Matrix-free GPU implementation of a preconditioned conjugate gradient solver for anisotropic elliptic PDEs
Many problems in geophysical and atmospheric modelling require the fast
solution of elliptic partial differential equations (PDEs) in "flat" three
dimensional geometries. In particular, an anisotropic elliptic PDE for the
pressure correction has to be solved at every time step in the dynamical core
of many numerical weather prediction models, and equations of a very similar
structure arise in global ocean models, subsurface flow simulations and gas and
oil reservoir modelling. The elliptic solve is often the bottleneck of the
forecast, and an algorithmically optimal method has to be used and implemented
efficiently. Graphics Processing Units have been shown to be highly efficient
for a wide range of applications in scientific computing, and recently
iterative solvers have been parallelised on these architectures. We describe
the GPU implementation and optimisation of a Preconditioned Conjugate Gradient
(PCG) algorithm for the solution of a three dimensional anisotropic elliptic
PDE for the pressure correction in NWP. Our implementation exploits the strong
vertical anisotropy of the elliptic operator in the construction of a suitable
preconditioner. As the algorithm is memory bound, performance can be improved
significantly by reducing the amount of global memory access. We achieve this
by using a matrix-free implementation which does not require explicit storage
of the matrix and instead recalculates the local stencil. Global memory access
can also be reduced by rewriting the algorithm using loop fusion and we show
that this further reduces the runtime on the GPU. We demonstrate the
performance of our matrix-free GPU code by comparing it to a sequential CPU
implementation and to a matrix-explicit GPU code which uses existing libraries.
The absolute performance of the algorithm for different problem sizes is
quantified in terms of floating point throughput and global memory bandwidth.Comment: 18 pages, 7 figure
Block Locally Optimal Preconditioned Eigenvalue Xolvers (BLOPEX) in hypre and PETSc
We describe our software package Block Locally Optimal Preconditioned
Eigenvalue Xolvers (BLOPEX) publicly released recently. BLOPEX is available as
a stand-alone serial library, as an external package to PETSc (``Portable,
Extensible Toolkit for Scientific Computation'', a general purpose suite of
tools for the scalable solution of partial differential equations and related
problems developed by Argonne National Laboratory), and is also built into {\it
hypre} (``High Performance Preconditioners'', scalable linear solvers package
developed by Lawrence Livermore National Laboratory). The present BLOPEX
release includes only one solver--the Locally Optimal Block Preconditioned
Conjugate Gradient (LOBPCG) method for symmetric eigenvalue problems. {\it
hypre} provides users with advanced high-quality parallel preconditioners for
linear systems, in particular, with domain decomposition and multigrid
preconditioners. With BLOPEX, the same preconditioners can now be efficiently
used for symmetric eigenvalue problems. PETSc facilitates the integration of
independently developed application modules with strict attention to component
interoperability, and makes BLOPEX extremely easy to compile and use with
preconditioners that are available via PETSc. We present the LOBPCG algorithm
in BLOPEX for {\it hypre} and PETSc. We demonstrate numerically the scalability
of BLOPEX by testing it on a number of distributed and shared memory parallel
systems, including a Beowulf system, SUN Fire 880, an AMD dual-core Opteron
workstation, and IBM BlueGene/L supercomputer, using PETSc domain decomposition
and {\it hypre} multigrid preconditioning. We test BLOPEX on a model problem,
the standard 7-point finite-difference approximation of the 3-D Laplacian, with
the problem size in the range .Comment: Submitted to SIAM Journal on Scientific Computin
ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers
Solving the electronic structure from a generalized or standard eigenproblem
is often the bottleneck in large scale calculations based on Kohn-Sham
density-functional theory. This problem must be addressed by essentially all
current electronic structure codes, based on similar matrix expressions, and by
high-performance computation. We here present a unified software interface,
ELSI, to access different strategies that address the Kohn-Sham eigenvalue
problem. Currently supported algorithms include the dense generalized
eigensolver library ELPA, the orbital minimization method implemented in
libOMM, and the pole expansion and selected inversion (PEXSI) approach with
lower computational complexity for semilocal density functionals. The ELSI
interface aims to simplify the implementation and optimal use of the different
strategies, by offering (a) a unified software framework designed for the
electronic structure solvers in Kohn-Sham density-functional theory; (b)
reasonable default parameters for a chosen solver; (c) automatic conversion
between input and internal working matrix formats, and in the future (d)
recommendation of the optimal solver depending on the specific problem.
Comparative benchmarks are shown for system sizes up to 11,520 atoms (172,800
basis functions) on distributed memory supercomputing architectures.Comment: 55 pages, 14 figures, 2 table
Parallel eigensolvers in plane-wave Density Functional Theory
We consider the problem of parallelizing electronic structure computations in
plane-wave Density Functional Theory. Because of the limited scalability of
Fourier transforms, parallelism has to be found at the eigensolver level. We
show how a recently proposed algorithm based on Chebyshev polynomials can scale
into the tens of thousands of processors, outperforming block conjugate
gradient algorithms for large computations
Comparison of Krylov subspace methods with preconditioning techniques for solving boundary value problems
AbstractIn this paper, we made an attempt to establish the usefulness of Lanczos solver with preconditioning technique over the preconditioned Conjugate Gradient (CG) solvers. We have presented here a detail comparative study with respect to convergence, speed as well as CPU-time, by considering appropriate boundary value problems
- …