Search CORE

9,577 research outputs found

An efficient multi-core implementation of a novel HSS-structured multifrontal solver using randomized sampling

Author: Ghysels Pieter
Li Xiaoye S.
Napov Artem
Rouet Francois-Henry
Williams Samuel
Publication venue
Publication date: 25/02/2015
Field of study

We present a sparse linear system solver that is based on a multifrontal variant of Gaussian elimination, and exploits low-rank approximation of the resulting dense frontal matrices. We use hierarchically semiseparable (HSS) matrices, which have low-rank off-diagonal blocks, to approximate the frontal matrices. For HSS matrix construction, a randomized sampling algorithm is used together with interpolative decompositions. The combination of the randomized compression with a fast ULV HSS factorization leads to a solver with lower computational complexity than the standard multifrontal method for many applications, resulting in speedups up to 7 fold for problems in our test suite. The implementation targets many-core systems by using task parallelism with dynamic runtime scheduling. Numerical experiments show performance improvements over state-of-the-art sparse direct solvers. The implementation achieves high performance and good scalability on a range of modern shared memory parallel systems, including the Intel Xeon Phi (MIC). The code is part of a software package called STRUMPACK -- STRUctured Matrices PACKage, which also has a distributed memory component for dense rank-structured matrices

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

DI-fusion

Parallel accelerated cyclic reduction preconditioner for three-dimensional elliptic PDEs with variable coefficients

Author: Chávez Gustavo
Keyes David
Turkiyyah George
Zampini Stefano
Publication venue: 'Elsevier BV'
Publication date: 23/12/2017
Field of study

We present a robust and scalable preconditioner for the solution of large-scale linear systems that arise from the discretization of elliptic PDEs amenable to rank compression. The preconditioner is based on hierarchical low-rank approximations and the cyclic reduction method. The setup and application phases of the preconditioner achieve log-linear complexity in memory footprint and number of operations, and numerical experiments exhibit good weak and strong scalability at large processor counts in a distributed memory environment. Numerical experiments with linear systems that feature symmetry and nonsymmetry, definiteness and indefiniteness, constant and variable coefficients demonstrate the preconditioner applicability and robustness. Furthermore, it is possible to control the number of iterations via the accuracy threshold of the hierarchical matrix approximations and their arithmetic operations, and the tuning of the admissibility condition parameter. Together, these parameters allow for optimization of the memory requirements and performance of the preconditioner.Comment: 24 pages, Elsevier Journal of Computational and Applied Mathematics, Dec 201

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

A Direct Elliptic Solver Based on Hierarchically Low-rank Schur Complements

Author: A. Aminfar
B.L. Buzbee
I. Ibragimov
J. Xia
J. Xia
J. Xia
L. Grasedyck
P. Amestoy
P. Swarztrauber
P.G. Schmitz
P.G. Schmitz
R.W. Hockney
S. Ambikasaran
S. Chandrasekaran
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 03/04/2016
Field of study

A parallel fast direct solver for rank-compressible block tridiagonal linear systems is presented. Algorithmic synergies between Cyclic Reduction and Hierarchical matrix arithmetic operations result in a solver with

O(N \log^2 N)

arithmetic complexity and

O(N \log N)

memory footprint. We provide a baseline for performance and applicability by comparing with well known implementations of the

\mathcal{H}

-LU factorization and algebraic multigrid with a parallel implementation that leverages the concurrency features of the method. Numerical experiments reveal that this method is comparable with other fast direct solvers based on Hierarchical Matrices such as

\mathcal{H}

-LU and that it can tackle problems where algebraic multigrid fails to converge

arXiv.org e-Print Archive

Crossref

A simple multigrid scheme for solving the Poisson equation with arbitrary domain boundaries

Author: Almgren
Brandt
Brandt
Cheng
Colella
Gibou
Huang
Jessop
Johansen
Khokhlov
Knebe
Kravtsov
Liu
Llinares
Martin
McCorquodale
Miniati
Osher
Popinet
Press
Ricker
Romain Teyssier
Rudman
Saad
Sussman
Sussman
Teyssier
Thomas Guillet
Tiret
van Leer
Wesseling
Ye
Publication venue: 'Elsevier BV'
Publication date: 09/04/2011
Field of study

We present a new multigrid scheme for solving the Poisson equation with Dirichlet boundary conditions on a Cartesian grid with irregular domain boundaries. This scheme was developed in the context of the Adaptive Mesh Refinement (AMR) schemes based on a graded-octree data structure. The Poisson equation is solved on a level-by-level basis, using a "one-way interface" scheme in which boundary conditions are interpolated from the previous coarser level solution. Such a scheme is particularly well suited for self-gravitating astrophysical flows requiring an adaptive time stepping strategy. By constructing a multigrid hierarchy covering the active cells of each AMR level, we have designed a memory-efficient algorithm that can benefit fully from the multigrid acceleration. We present a simple method for capturing the boundary conditions across the multigrid hierarchy, based on a second-order accurate reconstruction of the boundaries of the multigrid levels. In case of very complex boundaries, small scale features become smaller than the discretization cell size of coarse multigrid levels and convergence problems arise. We propose a simple solution to address these issues. Using our scheme, the convergence rate usually depends on the grid size for complex grids, but good linear convergence is maintained. The proposed method was successfully implemented on distributed memory architectures in the RAMSES code, for which we present and discuss convergence and accuracy properties as well as timing performances.Comment: 33 pages, 15 figures, accepted for publication in Journal of Computational Physic

arXiv.org e-Print Archive

Crossref

Open Research Exeter

ZORA