1,074 research outputs found
Block Locally Optimal Preconditioned Eigenvalue Xolvers (BLOPEX) in hypre and PETSc
We describe our software package Block Locally Optimal Preconditioned
Eigenvalue Xolvers (BLOPEX) publicly released recently. BLOPEX is available as
a stand-alone serial library, as an external package to PETSc (``Portable,
Extensible Toolkit for Scientific Computation'', a general purpose suite of
tools for the scalable solution of partial differential equations and related
problems developed by Argonne National Laboratory), and is also built into {\it
hypre} (``High Performance Preconditioners'', scalable linear solvers package
developed by Lawrence Livermore National Laboratory). The present BLOPEX
release includes only one solver--the Locally Optimal Block Preconditioned
Conjugate Gradient (LOBPCG) method for symmetric eigenvalue problems. {\it
hypre} provides users with advanced high-quality parallel preconditioners for
linear systems, in particular, with domain decomposition and multigrid
preconditioners. With BLOPEX, the same preconditioners can now be efficiently
used for symmetric eigenvalue problems. PETSc facilitates the integration of
independently developed application modules with strict attention to component
interoperability, and makes BLOPEX extremely easy to compile and use with
preconditioners that are available via PETSc. We present the LOBPCG algorithm
in BLOPEX for {\it hypre} and PETSc. We demonstrate numerically the scalability
of BLOPEX by testing it on a number of distributed and shared memory parallel
systems, including a Beowulf system, SUN Fire 880, an AMD dual-core Opteron
workstation, and IBM BlueGene/L supercomputer, using PETSc domain decomposition
and {\it hypre} multigrid preconditioning. We test BLOPEX on a model problem,
the standard 7-point finite-difference approximation of the 3-D Laplacian, with
the problem size in the range .Comment: Submitted to SIAM Journal on Scientific Computin
Inner product computation for sparse iterative solvers on\ud distributed supercomputer
Recent years have witnessed that iterative Krylov methods without re-designing are not suitable for distribute supercomputers because of intensive global communications. It is well accepted that re-engineering Krylov methods for prescribed computer architecture is necessary and important to achieve higher performance and scalability. The paper focuses on simple and practical ways to re-organize Krylov methods and improve their performance for current heterogeneous distributed supercomputers. In construct with most of current software development of Krylov methods which usually focuses on efficient matrix vector multiplications, the paper focuses on the way to compute inner products on supercomputers and explains why inner product computation on current heterogeneous distributed supercomputers is crucial for scalable Krylov methods. Communication complexity analysis shows that how the inner product computation can be the bottleneck of performance of (inner) product-type iterative solvers on distributed supercomputers due to global communications. Principles of reducing such global communications are discussed. The importance of minimizing communications is demonstrated by experiments using up to 900 processors. The experiments were carried on a Dawning 5000A, one of the fastest and earliest heterogeneous supercomputers in the world. Both the analysis and experiments indicates that inner product computation is very likely to be the most challenging kernel for inner product-based iterative solvers to achieve exascale
A Fast Parallel Poisson Solver on Irregular Domains Applied to Beam Dynamic Simulations
We discuss the scalable parallel solution of the Poisson equation within a
Particle-In-Cell (PIC) code for the simulation of electron beams in particle
accelerators of irregular shape. The problem is discretized by Finite
Differences. Depending on the treatment of the Dirichlet boundary the resulting
system of equations is symmetric or `mildly' nonsymmetric positive definite. In
all cases, the system is solved by the preconditioned conjugate gradient
algorithm with smoothed aggregation (SA) based algebraic multigrid (AMG)
preconditioning. We investigate variants of the implementation of SA-AMG that
lead to considerable improvements in the execution times. We demonstrate good
scalability of the solver on distributed memory parallel processor with up to
2048 processors. We also compare our SAAMG-PCG solver with an FFT-based solver
that is more commonly used for applications in beam dynamics
- …