85 research outputs found

    Sparse Supernodal Solver Using Block Low-Rank Compression

    Get PDF
    This paper presents two approaches using a Block Low-Rank (BLR) compression technique to reduce the memory footprint and/or the time-to-solution of the sparse supernodal solver PaStiX. This flat, non-hierarchical, compression method allows to take advantage of the low-rank property of the blocks appearing during the factorization of sparse linear systems, which come from the discretization of partial differential equations. The first approach, called Minimal Memory, illustrates the maximum memory gain that can be obtained with the BLR compression method, while the second approach, called Just-In-Time, mainly focuses on reducing the computational complexity and thus the time-to-solution. Singular Value Decomposition (SVD) and Rank-Revealing QR (RRQR), as compression kernels, are both compared in terms of factorization time, memory consumption, as well as numerical properties. Experiments on a single node with 24 threads and 128 GB of memory are presented on a set of matrices from real-life problems. We demonstrate a memory footprint reduction of up to 4.4 times using the Minimal Memory strategy and a computational time speedup of up to 3.3 times with the Just-In-Time strategy

    Recent advances in sparse direct solvers

    Get PDF
    International audienceDirect methods for the solution of sparse systems of linear equations of the form A x = b are used in a wide range of numerical simulation applications. Such methods are based on the decomposition of the matrix into a product of triangular factors (e.g., A = L U ), followed by triangular solves. They are known for their numerical accuracy and robustness but are also characterized by a high memory consumption and a large amount of computations. Here we survey some research directions that are being investigated by the sparse direct solver community to alleviate these issues: memory-aware scheduling techniques, low-rank approximations, and distributed/shared memory hybrid programming

    Sparse Supernodal Solver Using Block Low-Rank Compression: design, performance and analysis

    Get PDF
    This paper presents two approaches using a Block Low-Rank (BLR) compressiontechnique to reduce the memory footprint and/or the time-to-solution of the sparse supernodalsolver PaStiX. This flat, non-hierarchical, compression method allows to take advantage of thelow-rank property of the blocks appearing during the factorization of sparse linear systems, whichcome from the discretization of partial differential equations. The first approach, called MinimalMemory, illustrates the maximum memory gain that can be obtained with the BLR compressionmethod, while the second approach, called Just-In-Time, mainly focuses on reducing the com-putational complexity and thus the time-to-solution. Singular Value Decomposition (SVD) andRank-Revealing QR (RRQR), as compression kernels, are both compared in terms of factorizationtime, memory consumption, as well as numerical properties. Experiments on a single node with24 threads and 128 GB of memory are performed to evaluate the potential of both strategies. Ona set of matrices from real-life problems, we demonstrate a memory footprint reduction of up to 4times using the Minimal Memory strategy and a computational time speedup of up to 3.5 timeswith the Just-In-Time strategy. Then, we study the impact of configuration parameters of theBLR solver that allowed us to solve a 3D laplacian of 36 million unknowns a single node, while thefull-rank solver stopped at 8 million due to memory limitation

    Sparse supernodal solver using block low-rank compression: Design, performance and analysis

    Get PDF
    International audienceIn this work, we present two approaches using a Block Low-Rank (BLR) compression technique to reduce the memory footprint and/or the time-to-solution of the sparse supernodal solver PaStiX. This flat, non-hierarchical, compression method allows to take advantage of the low-rank property of the blocks appearing during the factorization of sparse linear systems, which come from the discretization of partial differential equations. The proposed solver can be used either as a direct solver at a lower precision or as a very robust preconditioner. The first approach, called Minimal Memory, illustrates the maximum memory gain that can be obtained with the BLR compression method, while the second approach, called Just-In-Time, mainly focuses on reducing the computational complexity and thus the time-to-solution. Singular Value Decomposition (SVD), Rank-Revealing QR (RRQR), and other variants using randomized compression kernels, are compared in terms of factorization time, memory consumption, as well as numerical properties. On a set of matrices from real-life problems, we demonstrate a memory footprint reduction of up to 4 times using the Minimal Memory strategy and a computational time speedup of up to 3.5 times with the Just-In-Time strategy

    Sparse supernodal solver with low-rank compression for solving the frequency-domain Maxwell equations discretized by a high order HDG method

    Get PDF
    National audienceIn this talk, we present the use of PaStiX sparse direct solver in a Schwarz method for solving the frequencydomainMaxwell equations discretized by a high order HDG method. More precisely, the sparse solver is usedto solve a system on sub-domains while iterative refinement is performed to get the global solution. Recently,low-rank compression have been added to PaStiX in order to reduce the time-to-solution or the memory footprintof the solver. The resulting low-rank solver can be used either as a direct solver at a lower accuracy oras a good preconditionner for iterative methods. We will investigate the use of low-rank compression for thefrequency-domain Maxwell equations on large systems to experiment the compressibility of this equation

    Sparse Supernodal Solver exploiting Low-Rankness Property

    Get PDF
    International audienceIn this talk, we will present recent advances on PaStiX, a supernodal sparse direct solver, which has been enhanced by the introduction of Block Low-Rank compression. We will describe different strategies leading to memory consumption gain and/or time-to-solution reduction. Finally, the implementation on top of runtime systems (Parsec, StarPU), will be compared with the static scheduling used in previous experiments

    An efficient multi-core implementation of a novel HSS-structured multifrontal solver using randomized sampling

    Full text link
    We present a sparse linear system solver that is based on a multifrontal variant of Gaussian elimination, and exploits low-rank approximation of the resulting dense frontal matrices. We use hierarchically semiseparable (HSS) matrices, which have low-rank off-diagonal blocks, to approximate the frontal matrices. For HSS matrix construction, a randomized sampling algorithm is used together with interpolative decompositions. The combination of the randomized compression with a fast ULV HSS factorization leads to a solver with lower computational complexity than the standard multifrontal method for many applications, resulting in speedups up to 7 fold for problems in our test suite. The implementation targets many-core systems by using task parallelism with dynamic runtime scheduling. Numerical experiments show performance improvements over state-of-the-art sparse direct solvers. The implementation achieves high performance and good scalability on a range of modern shared memory parallel systems, including the Intel Xeon Phi (MIC). The code is part of a software package called STRUMPACK -- STRUctured Matrices PACKage, which also has a distributed memory component for dense rank-structured matrices

    Sparse Approximate Multifrontal Factorization with Butterfly Compression for High Frequency Wave Equations

    Full text link
    We present a fast and approximate multifrontal solver for large-scale sparse linear systems arising from finite-difference, finite-volume or finite-element discretization of high-frequency wave equations. The proposed solver leverages the butterfly algorithm and its hierarchical matrix extension for compressing and factorizing large frontal matrices via graph-distance guided entry evaluation or randomized matrix-vector multiplication-based schemes. Complexity analysis and numerical experiments demonstrate O(Nlog⁡2N)\mathcal{O}(N\log^2 N) computation and O(N)\mathcal{O}(N) memory complexity when applied to an N×NN\times N sparse system arising from 3D high-frequency Helmholtz and Maxwell problems
    • 

    corecore