1,074 research outputs found

    Block Locally Optimal Preconditioned Eigenvalue Xolvers (BLOPEX) in hypre and PETSc

    Full text link
    We describe our software package Block Locally Optimal Preconditioned Eigenvalue Xolvers (BLOPEX) publicly released recently. BLOPEX is available as a stand-alone serial library, as an external package to PETSc (``Portable, Extensible Toolkit for Scientific Computation'', a general purpose suite of tools for the scalable solution of partial differential equations and related problems developed by Argonne National Laboratory), and is also built into {\it hypre} (``High Performance Preconditioners'', scalable linear solvers package developed by Lawrence Livermore National Laboratory). The present BLOPEX release includes only one solver--the Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) method for symmetric eigenvalue problems. {\it hypre} provides users with advanced high-quality parallel preconditioners for linear systems, in particular, with domain decomposition and multigrid preconditioners. With BLOPEX, the same preconditioners can now be efficiently used for symmetric eigenvalue problems. PETSc facilitates the integration of independently developed application modules with strict attention to component interoperability, and makes BLOPEX extremely easy to compile and use with preconditioners that are available via PETSc. We present the LOBPCG algorithm in BLOPEX for {\it hypre} and PETSc. We demonstrate numerically the scalability of BLOPEX by testing it on a number of distributed and shared memory parallel systems, including a Beowulf system, SUN Fire 880, an AMD dual-core Opteron workstation, and IBM BlueGene/L supercomputer, using PETSc domain decomposition and {\it hypre} multigrid preconditioning. We test BLOPEX on a model problem, the standard 7-point finite-difference approximation of the 3-D Laplacian, with the problem size in the range 105−10810^5-10^8.Comment: Submitted to SIAM Journal on Scientific Computin

    Inner product computation for sparse iterative solvers on\ud distributed supercomputer

    Get PDF
    Recent years have witnessed that iterative Krylov methods without re-designing are not suitable for distribute supercomputers because of intensive global communications. It is well accepted that re-engineering Krylov methods for prescribed computer architecture is necessary and important to achieve higher performance and scalability. The paper focuses on simple and practical ways to re-organize Krylov methods and improve their performance for current heterogeneous distributed supercomputers. In construct with most of current software development of Krylov methods which usually focuses on efficient matrix vector multiplications, the paper focuses on the way to compute inner products on supercomputers and explains why inner product computation on current heterogeneous distributed supercomputers is crucial for scalable Krylov methods. Communication complexity analysis shows that how the inner product computation can be the bottleneck of performance of (inner) product-type iterative solvers on distributed supercomputers due to global communications. Principles of reducing such global communications are discussed. The importance of minimizing communications is demonstrated by experiments using up to 900 processors. The experiments were carried on a Dawning 5000A, one of the fastest and earliest heterogeneous supercomputers in the world. Both the analysis and experiments indicates that inner product computation is very likely to be the most challenging kernel for inner product-based iterative solvers to achieve exascale

    A Fast Parallel Poisson Solver on Irregular Domains Applied to Beam Dynamic Simulations

    Full text link
    We discuss the scalable parallel solution of the Poisson equation within a Particle-In-Cell (PIC) code for the simulation of electron beams in particle accelerators of irregular shape. The problem is discretized by Finite Differences. Depending on the treatment of the Dirichlet boundary the resulting system of equations is symmetric or `mildly' nonsymmetric positive definite. In all cases, the system is solved by the preconditioned conjugate gradient algorithm with smoothed aggregation (SA) based algebraic multigrid (AMG) preconditioning. We investigate variants of the implementation of SA-AMG that lead to considerable improvements in the execution times. We demonstrate good scalability of the solver on distributed memory parallel processor with up to 2048 processors. We also compare our SAAMG-PCG solver with an FFT-based solver that is more commonly used for applications in beam dynamics
    • …
    corecore