96 research outputs found
Parallel accelerated cyclic reduction preconditioner for three-dimensional elliptic PDEs with variable coefficients
We present a robust and scalable preconditioner for the solution of
large-scale linear systems that arise from the discretization of elliptic PDEs
amenable to rank compression. The preconditioner is based on hierarchical
low-rank approximations and the cyclic reduction method. The setup and
application phases of the preconditioner achieve log-linear complexity in
memory footprint and number of operations, and numerical experiments exhibit
good weak and strong scalability at large processor counts in a distributed
memory environment. Numerical experiments with linear systems that feature
symmetry and nonsymmetry, definiteness and indefiniteness, constant and
variable coefficients demonstrate the preconditioner applicability and
robustness. Furthermore, it is possible to control the number of iterations via
the accuracy threshold of the hierarchical matrix approximations and their
arithmetic operations, and the tuning of the admissibility condition parameter.
Together, these parameters allow for optimization of the memory requirements
and performance of the preconditioner.Comment: 24 pages, Elsevier Journal of Computational and Applied Mathematics,
Dec 201
An efficient multi-core implementation of a novel HSS-structured multifrontal solver using randomized sampling
We present a sparse linear system solver that is based on a multifrontal
variant of Gaussian elimination, and exploits low-rank approximation of the
resulting dense frontal matrices. We use hierarchically semiseparable (HSS)
matrices, which have low-rank off-diagonal blocks, to approximate the frontal
matrices. For HSS matrix construction, a randomized sampling algorithm is used
together with interpolative decompositions. The combination of the randomized
compression with a fast ULV HSS factorization leads to a solver with lower
computational complexity than the standard multifrontal method for many
applications, resulting in speedups up to 7 fold for problems in our test
suite. The implementation targets many-core systems by using task parallelism
with dynamic runtime scheduling. Numerical experiments show performance
improvements over state-of-the-art sparse direct solvers. The implementation
achieves high performance and good scalability on a range of modern shared
memory parallel systems, including the Intel Xeon Phi (MIC). The code is part
of a software package called STRUMPACK -- STRUctured Matrices PACKage, which
also has a distributed memory component for dense rank-structured matrices
Recommended from our members
Parallel accelerated cyclic reduction preconditioner for three-dimensional elliptic PDEs with variable coefficients
We present a robust and scalable preconditioner for the solution of large-scale linear systems that arise from the discretization of elliptic PDEs amenable to rank compression. The preconditioner is based on hierarchical low-rank approximations and the cyclic reduction method. The setup and application phases of the preconditioner achieve log-linear complexity in memory footprint and number of operations, and numerical experiments exhibit good weak and strong scalability at large processor counts in a distributed memory environment. Numerical experiments with linear systems that feature symmetry and nonsymmetry, definiteness and indefiniteness, constant and variable coefficients demonstrate the preconditioner applicability and robustness. Furthermore, it is possible to control the number of iterations via the accuracy threshold of the hierarchical matrix approximations and their arithmetic operations, and the tuning of the admissibility condition parameter. Together, these parameters allow for optimization of the memory requirements and performance of the preconditioner
Hierarchical interpolative factorization for elliptic operators: differential equations
This paper introduces the hierarchical interpolative factorization for
elliptic partial differential equations (HIF-DE) in two (2D) and three
dimensions (3D). This factorization takes the form of an approximate
generalized LU/LDL decomposition that facilitates the efficient inversion of
the discretized operator. HIF-DE is based on the multifrontal method but uses
skeletonization on the separator fronts to sparsify the dense frontal matrices
and thus reduce the cost. We conjecture that this strategy yields linear
complexity in 2D and quasilinear complexity in 3D. Estimated linear complexity
in 3D can be achieved by skeletonizing the compressed fronts themselves, which
amounts geometrically to a recursive dimensional reduction scheme. Numerical
experiments support our claims and further demonstrate the performance of our
algorithm as a fast direct solver and preconditioner. MATLAB codes are freely
available.Comment: 37 pages, 13 figures, 12 tables; to appear, Comm. Pure Appl. Math.
arXiv admin note: substantial text overlap with arXiv:1307.266
SlabLU: A Two-Level Sparse Direct Solver for Elliptic PDEs
The paper describes a sparse direct solver for the linear systems that arise
from the discretization of an elliptic PDE on a two dimensional domain. The
solver is designed to reduce communication costs and perform well on GPUs; it
uses a two-level framework, which is easier to implement and optimize than
traditional multi-frontal schemes based on hierarchical nested dissection
orderings. The scheme decomposes the domain into thin subdomains, or "slabs".
Within each slab, a local factorization is executed that exploits the geometry
of the local domain. A global factorization is then obtained through the LU
factorization of a block-tridiagonal reduced coefficient matrix. The solver has
complexity for the factorization step, and for each
solve once the factorization is completed.
The solver described is compatible with a range of different local
discretizations, and numerical experiments demonstrate its performance for
regular discretizations of rectangular and curved geometries. The technique
becomes particularly efficient when combined with very high-order convergent
multi-domain spectral collocation schemes. With this discretization, a
Helmholtz problem on a domain of size (for
which N=100 \mbox{M}) is solved in 15 minutes to 6 correct digits on a
high-powered desktop with GPU acceleration
Improving multifrontal methods by means of block low-rank representations
Submitted for publication to SIAMMatrices coming from elliptic Partial Differential Equations (PDEs) have been shown to have a low-rank property: well defined off-diagonal blocks of their Schur complements can be approximated by low-rank products. Given a suitable ordering of the matrix which gives to the blocks a geometrical meaning, such approximations can be computed using an SVD or a rank-revealing QR factorization. The resulting representation offers a substantial reduction of the memory requirement and gives efficient ways to perform many of the basic dense algebra operations. Several strategies have been proposed to exploit this property. We propose a low-rank format called Block Low-Rank (BLR), and explain how it can be used to reduce the memory footprint and the complexity of direct solvers for sparse matrices based on the multifrontal method. We present experimental results that show how the BLR format delivers gains that are comparable to those obtained with hierarchical formats such as Hierarchical matrices (H matrices) and Hierarchically Semi-Separable (HSS matrices) but provides much greater flexibility and ease of use which are essential in the context of a general purpose, algebraic solver
A distributed-memory package for dense Hierarchically Semi-Separable matrix computations using randomization
We present a distributed-memory library for computations with dense
structured matrices. A matrix is considered structured if its off-diagonal
blocks can be approximated by a rank-deficient matrix with low numerical rank.
Here, we use Hierarchically Semi-Separable representations (HSS). Such matrices
appear in many applications, e.g., finite element methods, boundary element
methods, etc. Exploiting this structure allows for fast solution of linear
systems and/or fast computation of matrix-vector products, which are the two
main building blocks of matrix computations. The compression algorithm that we
use, that computes the HSS form of an input dense matrix, relies on randomized
sampling with a novel adaptive sampling mechanism. We discuss the
parallelization of this algorithm and also present the parallelization of
structured matrix-vector product, structured factorization and solution
routines. The efficiency of the approach is demonstrated on large problems from
different academic and industrial applications, on up to 8,000 cores.
This work is part of a more global effort, the STRUMPACK (STRUctured Matrices
PACKage) software package for computations with sparse and dense structured
matrices. Hence, although useful on their own right, the routines also
represent a step in the direction of a distributed-memory sparse solver
- …