537 research outputs found

    An efficient multi-core implementation of a novel HSS-structured multifrontal solver using randomized sampling

    Full text link
    We present a sparse linear system solver that is based on a multifrontal variant of Gaussian elimination, and exploits low-rank approximation of the resulting dense frontal matrices. We use hierarchically semiseparable (HSS) matrices, which have low-rank off-diagonal blocks, to approximate the frontal matrices. For HSS matrix construction, a randomized sampling algorithm is used together with interpolative decompositions. The combination of the randomized compression with a fast ULV HSS factorization leads to a solver with lower computational complexity than the standard multifrontal method for many applications, resulting in speedups up to 7 fold for problems in our test suite. The implementation targets many-core systems by using task parallelism with dynamic runtime scheduling. Numerical experiments show performance improvements over state-of-the-art sparse direct solvers. The implementation achieves high performance and good scalability on a range of modern shared memory parallel systems, including the Intel Xeon Phi (MIC). The code is part of a software package called STRUMPACK -- STRUctured Matrices PACKage, which also has a distributed memory component for dense rank-structured matrices

    Augmented Block-Arnoldi Recycling CFD Solvers

    Full text link
    One of the limitations of recycled GCRO methods is the large amount of computation required to orthogonalize the basis vectors of the newly generated Krylov subspace for the approximate solution when combined with those of the recycle subspace. Recent advancements in low synchronization Gram-Schmidt and generalized minimal residual algorithms, Swirydowicz et al.~\cite{2020-swirydowicz-nlawa}, Carson et al. \cite{Carson2022}, and Lund \cite{Lund2022}, can be incorporated, thereby mitigating the loss of orthogonality of the basis vectors. An augmented Arnoldi formulation of recycling leads to a matrix decomposition and the associated algorithm can also be viewed as a {\it block} Krylov method. Generalizations of both classical and modified block Gram-Schmidt algorithms have been proposed, Carson et al.~\cite{Carson2022}. Here, an inverse compact WYWY modified Gram-Schmidt algorithm is applied for the inter-block orthogonalization scheme with a block lower triangular correction matrix TkT_k at iteration kk. When combined with a weighted (oblique inner product) projection step, the inverse compact WYWY scheme leads to significant (over 10×\times in certain cases) reductions in the number of solver iterations per linear system. The weight is also interpreted in terms of the angle between restart residuals in LGMRES, as defined by Baker et al.\cite{Baker2005}. In many cases, the recycle subspace eigen-spectrum can substitute for a preconditioner

    Adaptively restarted block Krylov subspace methods with low-synchronization skeletons

    Full text link
    With the recent realization of exascale performace by Oak Ridge National Laboratory's Frontier supercomputer, reducing communication in kernels like QR factorization has become even more imperative. Low-synchronization Gram-Schmidt methods, first introduced in [K. \'{S}wirydowicz, J. Langou, S. Ananthan, U. Yang, and S. Thomas, Low Synchronization Gram-Schmidt and Generalized Minimum Residual Algorithms, Numer. Lin. Alg. Appl., Vol. 28(2), e2343, 2020], have been shown to improve the scalability of the Arnoldi method in high-performance distributed computing. Block versions of low-synchronization Gram-Schmidt show further potential for speeding up algorithms, as column-batching allows for maximizing cache usage with matrix-matrix operations. In this work, low-synchronization block Gram-Schmidt variants from [E. Carson, K. Lund, M. Rozlo\v{z}n\'{i}k, and S. Thomas, Block Gram-Schmidt algorithms and their stability properties, Lin. Alg. Appl., 638, pp. 150--195, 2022] are transformed into block Arnoldi variants for use in block full orthogonalization methods (BFOM) and block generalized minimal residual methods (BGMRES). An adaptive restarting heuristic is developed to handle instabilities that arise with the increasing condition number of the Krylov basis. The performance, accuracy, and stability of these methods are assessed via a flexible benchmarking tool written in MATLAB. The modularity of the tool additionally permits generalized block inner products, like the global inner product

    Adaptively Restarted Block Krylov Subspace Methods with Low-Synchronization Skeletons

    Get PDF
    • …
    corecore