1,571 research outputs found
Recommended from our members
Preparing sparse solvers for exascale computing.
Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'
An efficient multi-core implementation of a novel HSS-structured multifrontal solver using randomized sampling
We present a sparse linear system solver that is based on a multifrontal
variant of Gaussian elimination, and exploits low-rank approximation of the
resulting dense frontal matrices. We use hierarchically semiseparable (HSS)
matrices, which have low-rank off-diagonal blocks, to approximate the frontal
matrices. For HSS matrix construction, a randomized sampling algorithm is used
together with interpolative decompositions. The combination of the randomized
compression with a fast ULV HSS factorization leads to a solver with lower
computational complexity than the standard multifrontal method for many
applications, resulting in speedups up to 7 fold for problems in our test
suite. The implementation targets many-core systems by using task parallelism
with dynamic runtime scheduling. Numerical experiments show performance
improvements over state-of-the-art sparse direct solvers. The implementation
achieves high performance and good scalability on a range of modern shared
memory parallel systems, including the Intel Xeon Phi (MIC). The code is part
of a software package called STRUMPACK -- STRUctured Matrices PACKage, which
also has a distributed memory component for dense rank-structured matrices
A fast semi-direct least squares algorithm for hierarchically block separable matrices
We present a fast algorithm for linear least squares problems governed by
hierarchically block separable (HBS) matrices. Such matrices are generally
dense but data-sparse and can describe many important operators including those
derived from asymptotically smooth radial kernels that are not too oscillatory.
The algorithm is based on a recursive skeletonization procedure that exposes
this sparsity and solves the dense least squares problem as a larger,
equality-constrained, sparse one. It relies on a sparse QR factorization
coupled with iterative weighted least squares methods. In essence, our scheme
consists of a direct component, comprised of matrix compression and
factorization, followed by an iterative component to enforce certain equality
constraints. At most two iterations are typically required for problems that
are not too ill-conditioned. For an HBS matrix with
having bounded off-diagonal block rank, the algorithm has optimal complexity. If the rank increases with the spatial dimension as is
common for operators that are singular at the origin, then this becomes
in 1D, in 2D, and
in 3D. We illustrate the performance of the method on
both over- and underdetermined systems in a variety of settings, with an
emphasis on radial basis function approximation and efficient updating and
downdating.Comment: 24 pages, 8 figures, 6 tables; to appear in SIAM J. Matrix Anal. App
Sparse Approximate Multifrontal Factorization with Butterfly Compression for High Frequency Wave Equations
We present a fast and approximate multifrontal solver for large-scale sparse
linear systems arising from finite-difference, finite-volume or finite-element
discretization of high-frequency wave equations. The proposed solver leverages
the butterfly algorithm and its hierarchical matrix extension for compressing
and factorizing large frontal matrices via graph-distance guided entry
evaluation or randomized matrix-vector multiplication-based schemes. Complexity
analysis and numerical experiments demonstrate
computation and memory complexity when applied to an sparse system arising from 3D high-frequency Helmholtz and Maxwell problems
SlabLU: A Two-Level Sparse Direct Solver for Elliptic PDEs
The paper describes a sparse direct solver for the linear systems that arise
from the discretization of an elliptic PDE on a two dimensional domain. The
solver is designed to reduce communication costs and perform well on GPUs; it
uses a two-level framework, which is easier to implement and optimize than
traditional multi-frontal schemes based on hierarchical nested dissection
orderings. The scheme decomposes the domain into thin subdomains, or "slabs".
Within each slab, a local factorization is executed that exploits the geometry
of the local domain. A global factorization is then obtained through the LU
factorization of a block-tridiagonal reduced coefficient matrix. The solver has
complexity for the factorization step, and for each
solve once the factorization is completed.
The solver described is compatible with a range of different local
discretizations, and numerical experiments demonstrate its performance for
regular discretizations of rectangular and curved geometries. The technique
becomes particularly efficient when combined with very high-order convergent
multi-domain spectral collocation schemes. With this discretization, a
Helmholtz problem on a domain of size (for
which N=100 \mbox{M}) is solved in 15 minutes to 6 correct digits on a
high-powered desktop with GPU acceleration
- …