Search CORE

1,571 research outputs found

Recommended from our members

Preparing sparse solvers for exascale computing.

Author: Anzt Hartwig
Boman Erik
Curfman McInnes Lois
Falgout Rob
Ghysels Pieter
Heroux Michael
Li Xiaoye
Meier Yang Ulrike
Rajamanickam Sivasankaran
Rupp Karl
Smith Barry
Tran Mills Richard
Yamazaki Ichitaro
Publication venue: eScholarship, University of California
Publication date: 01/03/2020
Field of study

Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'

eScholarship - University of California

An efficient multi-core implementation of a novel HSS-structured multifrontal solver using randomized sampling

Author: Ghysels Pieter
Li Xiaoye S.
Napov Artem
Rouet Francois-Henry
Williams Samuel
Publication venue
Publication date: 25/02/2015
Field of study

We present a sparse linear system solver that is based on a multifrontal variant of Gaussian elimination, and exploits low-rank approximation of the resulting dense frontal matrices. We use hierarchically semiseparable (HSS) matrices, which have low-rank off-diagonal blocks, to approximate the frontal matrices. For HSS matrix construction, a randomized sampling algorithm is used together with interpolative decompositions. The combination of the randomized compression with a fast ULV HSS factorization leads to a solver with lower computational complexity than the standard multifrontal method for many applications, resulting in speedups up to 7 fold for problems in our test suite. The implementation targets many-core systems by using task parallelism with dynamic runtime scheduling. Numerical experiments show performance improvements over state-of-the-art sparse direct solvers. The implementation achieves high performance and good scalability on a range of modern shared memory parallel systems, including the Intel Xeon Phi (MIC). The code is part of a software package called STRUMPACK -- STRUctured Matrices PACKage, which also has a distributed memory component for dense rank-structured matrices

arXiv.org e-Print Archive

eScholarship - University of California

DI-fusion

A fast semi-direct least squares algorithm for hierarchically block separable matrices

Author: Greengard Leslie
Ho Kenneth L.
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2014
Field of study

We present a fast algorithm for linear least squares problems governed by hierarchically block separable (HBS) matrices. Such matrices are generally dense but data-sparse and can describe many important operators including those derived from asymptotically smooth radial kernels that are not too oscillatory. The algorithm is based on a recursive skeletonization procedure that exposes this sparsity and solves the dense least squares problem as a larger, equality-constrained, sparse one. It relies on a sparse QR factorization coupled with iterative weighted least squares methods. In essence, our scheme consists of a direct component, comprised of matrix compression and factorization, followed by an iterative component to enforce certain equality constraints. At most two iterations are typically required for problems that are not too ill-conditioned. For an

M \times N

HBS matrix with

M \geq N

having bounded off-diagonal block rank, the algorithm has optimal

\mathcal{O} (M + N)

complexity. If the rank increases with the spatial dimension as is common for operators that are singular at the origin, then this becomes

\mathcal{O} (M + N)

in 1D,

\mathcal{O} (M + N^{3/2})

in 2D, and

\mathcal{O} (M + N^{2})

in 3D. We illustrate the performance of the method on both over- and underdetermined systems in a variety of settings, with an emphasis on radial basis function approximation and efficient updating and downdating.Comment: 24 pages, 8 figures, 6 tables; to appear in SIAM J. Matrix Anal. App

arXiv.org e-Print Archive

CiteSeerX

Sparse Approximate Multifrontal Factorization with Butterfly Compression for High Frequency Wave Equations

Author: Claus Lisa
Ghysels Pieter
Li Xiaoye Sherry
Liu Yang
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2020
Field of study

We present a fast and approximate multifrontal solver for large-scale sparse linear systems arising from finite-difference, finite-volume or finite-element discretization of high-frequency wave equations. The proposed solver leverages the butterfly algorithm and its hierarchical matrix extension for compressing and factorizing large frontal matrices via graph-distance guided entry evaluation or randomized matrix-vector multiplication-based schemes. Complexity analysis and numerical experiments demonstrate

\mathcal{O}(N\log^2 N)

computation and

\mathcal{O}(N)

memory complexity when applied to an

N\times N

sparse system arising from 3D high-frequency Helmholtz and Maxwell problems

arXiv.org e-Print Archive

eScholarship - University of California

SlabLU: A Two-Level Sparse Direct Solver for Elliptic PDEs

Author: Martinsson Per-Gunnar
Yesypenko Anna
Publication venue
Publication date: 18/08/2023
Field of study

The paper describes a sparse direct solver for the linear systems that arise from the discretization of an elliptic PDE on a two dimensional domain. The solver is designed to reduce communication costs and perform well on GPUs; it uses a two-level framework, which is easier to implement and optimize than traditional multi-frontal schemes based on hierarchical nested dissection orderings. The scheme decomposes the domain into thin subdomains, or "slabs". Within each slab, a local factorization is executed that exploits the geometry of the local domain. A global factorization is then obtained through the LU factorization of a block-tridiagonal reduced coefficient matrix. The solver has complexity

O(N^{5/3})

for the factorization step, and

O(N^{7/6})

for each solve once the factorization is completed. The solver described is compatible with a range of different local discretizations, and numerical experiments demonstrate its performance for regular discretizations of rectangular and curved geometries. The technique becomes particularly efficient when combined with very high-order convergent multi-domain spectral collocation schemes. With this discretization, a Helmholtz problem on a domain of size

1000 \lambda \times 1000 \lambda

(for which N=100 \mbox{M}) is solved in 15 minutes to 6 correct digits on a high-powered desktop with GPU acceleration

arXiv.org e-Print Archive