44 research outputs found

    An efficient multi-core implementation of a novel HSS-structured multifrontal solver using randomized sampling

    Full text link
    We present a sparse linear system solver that is based on a multifrontal variant of Gaussian elimination, and exploits low-rank approximation of the resulting dense frontal matrices. We use hierarchically semiseparable (HSS) matrices, which have low-rank off-diagonal blocks, to approximate the frontal matrices. For HSS matrix construction, a randomized sampling algorithm is used together with interpolative decompositions. The combination of the randomized compression with a fast ULV HSS factorization leads to a solver with lower computational complexity than the standard multifrontal method for many applications, resulting in speedups up to 7 fold for problems in our test suite. The implementation targets many-core systems by using task parallelism with dynamic runtime scheduling. Numerical experiments show performance improvements over state-of-the-art sparse direct solvers. The implementation achieves high performance and good scalability on a range of modern shared memory parallel systems, including the Intel Xeon Phi (MIC). The code is part of a software package called STRUMPACK -- STRUctured Matrices PACKage, which also has a distributed memory component for dense rank-structured matrices

    A Direct Elliptic Solver Based on Hierarchically Low-rank Schur Complements

    Full text link
    A parallel fast direct solver for rank-compressible block tridiagonal linear systems is presented. Algorithmic synergies between Cyclic Reduction and Hierarchical matrix arithmetic operations result in a solver with O(Nlog2N)O(N \log^2 N) arithmetic complexity and O(NlogN)O(N \log N) memory footprint. We provide a baseline for performance and applicability by comparing with well known implementations of the H\mathcal{H}-LU factorization and algebraic multigrid with a parallel implementation that leverages the concurrency features of the method. Numerical experiments reveal that this method is comparable with other fast direct solvers based on Hierarchical Matrices such as H\mathcal{H}-LU and that it can tackle problems where algebraic multigrid fails to converge

    Parallel accelerated cyclic reduction preconditioner for three-dimensional elliptic PDEs with variable coefficients

    Full text link
    We present a robust and scalable preconditioner for the solution of large-scale linear systems that arise from the discretization of elliptic PDEs amenable to rank compression. The preconditioner is based on hierarchical low-rank approximations and the cyclic reduction method. The setup and application phases of the preconditioner achieve log-linear complexity in memory footprint and number of operations, and numerical experiments exhibit good weak and strong scalability at large processor counts in a distributed memory environment. Numerical experiments with linear systems that feature symmetry and nonsymmetry, definiteness and indefiniteness, constant and variable coefficients demonstrate the preconditioner applicability and robustness. Furthermore, it is possible to control the number of iterations via the accuracy threshold of the hierarchical matrix approximations and their arithmetic operations, and the tuning of the admissibility condition parameter. Together, these parameters allow for optimization of the memory requirements and performance of the preconditioner.Comment: 24 pages, Elsevier Journal of Computational and Applied Mathematics, Dec 201

    A distributed-memory package for dense Hierarchically Semi-Separable matrix computations using randomization

    Full text link
    We present a distributed-memory library for computations with dense structured matrices. A matrix is considered structured if its off-diagonal blocks can be approximated by a rank-deficient matrix with low numerical rank. Here, we use Hierarchically Semi-Separable representations (HSS). Such matrices appear in many applications, e.g., finite element methods, boundary element methods, etc. Exploiting this structure allows for fast solution of linear systems and/or fast computation of matrix-vector products, which are the two main building blocks of matrix computations. The compression algorithm that we use, that computes the HSS form of an input dense matrix, relies on randomized sampling with a novel adaptive sampling mechanism. We discuss the parallelization of this algorithm and also present the parallelization of structured matrix-vector product, structured factorization and solution routines. The efficiency of the approach is demonstrated on large problems from different academic and industrial applications, on up to 8,000 cores. This work is part of a more global effort, the STRUMPACK (STRUctured Matrices PACKage) software package for computations with sparse and dense structured matrices. Hence, although useful on their own right, the routines also represent a step in the direction of a distributed-memory sparse solver

    A Distributed-Memory Randomized Structured Multifrontal Method for Sparse Direct Solutions

    Get PDF
    We design a distributed-memory randomized structured multifrontal solver for large sparse matrices. Two layers of hierarchical tree parallelism are used. A sequence of innovative parallel methods are developed for randomized structured frontal matrix operations, structured update matrix computation, skinny extend-add operation, selected entry extraction from structured matrices, etc. Several strategies are proposed to reuse computations and reduce communications. Unlike an earlier parallel structured multifrontal method that still involves large dense intermediate matrices, our parallel solver performs the major operations in terms of skinny matrices and fully structured forms. It thus significantly enhances the efficiency and scalability. Systematic communication cost analysis shows that the numbers of words are reduced by factors of about O(n/r)O(\sqrt{n}/r) in two dimensions and about O(n2/3/r)O(n^{2/3}/r) in three dimensions, where nn is the matrix size and rr is an off-diagonal numerical rank bound of the intermediate frontal matrices. The efficiency and parallel performance are demonstrated with the solution of some large discretized PDEs in two and three dimensions. Nice scalability and significant savings in the cost and memory can be observed from the weak and strong scaling tests, especially for some 3D problems discretized on unstructured meshes

    Sparse Approximate Multifrontal Factorization with Butterfly Compression for High Frequency Wave Equations

    Full text link
    We present a fast and approximate multifrontal solver for large-scale sparse linear systems arising from finite-difference, finite-volume or finite-element discretization of high-frequency wave equations. The proposed solver leverages the butterfly algorithm and its hierarchical matrix extension for compressing and factorizing large frontal matrices via graph-distance guided entry evaluation or randomized matrix-vector multiplication-based schemes. Complexity analysis and numerical experiments demonstrate O(Nlog2N)\mathcal{O}(N\log^2 N) computation and O(N)\mathcal{O}(N) memory complexity when applied to an N×NN\times N sparse system arising from 3D high-frequency Helmholtz and Maxwell problems
    corecore