1,622 research outputs found

    Parallel matrix inversion techniques

    Full text link
    In this paper, we present techniques for inverting sparse, symmetric and positive definite matrices on parallel and distributed computers. We propose two algorithms, one for SIMD implementation and the other for MIMD implementation. These algorithms are modified versions of Gaussian elimination and they take into account the sparseness of the matrix. Our algorithms perform better than the general parallel Gaussian elimination algorithm. In order to demonstrate the usefulness of our technique, we implemented the snake problem using our sparse matrix algorithm. Our studies reveal that the proposed sparse matrix inversion algorithm significantly reduces the time taken for obtaining the solution of the snake problem. In this paper, we present the results of our experimental work

    Symmetric indefinite triangular factorization revealing the rank profile matrix

    Get PDF
    We present a novel recursive algorithm for reducing a symmetric matrix to a triangular factorization which reveals the rank profile matrix. That is, the algorithm computes a factorization PTAP=LDLT\mathbf{P}^T\mathbf{A}\mathbf{P} = \mathbf{L}\mathbf{D}\mathbf{L}^T where P\mathbf{P} is a permutation matrix, L\mathbf{L} is lower triangular with a unit diagonal and D\mathbf{D} is symmetric block diagonal with 1×11{\times}1 and 2×22{\times}2 antidiagonal blocks. The novel algorithm requires O(n2rω−2)O(n^2r^{\omega-2}) arithmetic operations. Furthermore, experimental results demonstrate that our algorithm can even be slightly more than twice as fast as the state of the art unsymmetric Gaussian elimination in most cases, that is it achieves approximately the same computational speed. By adapting the pivoting strategy developed in the unsymmetric case, we show how to recover the rank profile matrix from the permutation matrix and the support of the block-diagonal matrix. There is an obstruction in characteristic 22 for revealing the rank profile matrix which requires to relax the shape of the block diagonal by allowing the 2-dimensional blocks to have a non-zero bottom-right coefficient. This relaxed decomposition can then be transformed into a standard PLDLTPT\mathbf{P}\mathbf{L}\mathbf{D}\mathbf{L}^T\mathbf{P}^T decomposition at a negligible cost

    Linearly scaling direct method for accurately inverting sparse banded matrices

    Get PDF
    In many problems in Computational Physics and Chemistry, one finds a special kind of sparse matrices, termed "banded matrices". These matrices, which are defined as having non-zero entries only within a given distance from the main diagonal, need often to be inverted in order to solve the associated linear system of equations. In this work, we introduce a new O(n) algorithm for solving such a system, being n X n the size of the matrix. We produce the analytical recursive expressions that allow to directly obtain the solution, as well as the pseudocode for its computer implementation. Moreover, we review the different options for possibly parallelizing the method, we describe the extension to deal with matrices that are banded plus a small number of non-zero entries outside the band, and we use the same ideas to produce a method for obtaining the full inverse matrix. Finally, we show that the New Algorithm is competitive, both in accuracy and in numerical efficiency, when compared to a standard method based in Gaussian elimination. We do this using sets of large random banded matrices, as well as the ones that appear when one tries to solve the 1D Poisson equation by finite differences.Comment: 24 pages, 5 figures, submitted to J. Comp. Phy

    Computing the Rank Profile Matrix

    Get PDF
    The row (resp. column) rank profile of a matrix describes the staircase shape of its row (resp. column) echelon form. In an ISSAC'13 paper, we proposed a recursive Gaussian elimination that can compute simultaneously the row and column rank profiles of a matrix as well as those of all of its leading sub-matrices, in the same time as state of the art Gaussian elimination algorithms. Here we first study the conditions making a Gaus-sian elimination algorithm reveal this information. Therefore, we propose the definition of a new matrix invariant, the rank profile matrix, summarizing all information on the row and column rank profiles of all the leading sub-matrices. We also explore the conditions for a Gaussian elimination algorithm to compute all or part of this invariant, through the corresponding PLUQ decomposition. As a consequence, we show that the classical iterative CUP decomposition algorithm can actually be adapted to compute the rank profile matrix. Used, in a Crout variant, as a base-case to our ISSAC'13 implementation, it delivers a significant improvement in efficiency. Second, the row (resp. column) echelon form of a matrix are usually computed via different dedicated triangular decompositions. We show here that, from some PLUQ decompositions, it is possible to recover the row and column echelon forms of a matrix and of any of its leading sub-matrices thanks to an elementary post-processing algorithm

    On the equivalence of Gaussian elimination and Gauss-Jordan reduction in solving linear equations

    Get PDF
    A novel general approach to round-off error analysis using the error complexity concepts is described. This is applied to the analysis of the Gaussian Elimination and Gauss-Jordan scheme for solving linear equations. The results show that the two algorithms are equivalent in terms of our error complexity measures. Thus the inherently parallel Gauss-Jordan scheme can be implemented with confidence if parallel computers are available

    LU factorization with panel rank revealing pivoting and its communication avoiding version

    Get PDF
    We present the LU decomposition with panel rank revealing pivoting (LU_PRRP), an LU factorization algorithm based on strong rank revealing QR panel factorization. LU_PRRP is more stable than Gaussian elimination with partial pivoting (GEPP). Our extensive numerical experiments show that the new factorization scheme is as numerically stable as GEPP in practice, but it is more resistant to pathological cases and easily solves the Wilkinson matrix and the Foster matrix. We also present CALU_PRRP, a communication avoiding version of LU_PRRP that minimizes communication. CALU_PRRP is based on tournament pivoting, with the selection of the pivots at each step of the tournament being performed via strong rank revealing QR factorization. CALU_PRRP is more stable than CALU, the communication avoiding version of GEPP. CALU_PRRP is also more stable in practice and is resistant to pathological cases on which GEPP and CALU fail.Comment: No. RR-7867 (2012

    An efficient multi-core implementation of a novel HSS-structured multifrontal solver using randomized sampling

    Full text link
    We present a sparse linear system solver that is based on a multifrontal variant of Gaussian elimination, and exploits low-rank approximation of the resulting dense frontal matrices. We use hierarchically semiseparable (HSS) matrices, which have low-rank off-diagonal blocks, to approximate the frontal matrices. For HSS matrix construction, a randomized sampling algorithm is used together with interpolative decompositions. The combination of the randomized compression with a fast ULV HSS factorization leads to a solver with lower computational complexity than the standard multifrontal method for many applications, resulting in speedups up to 7 fold for problems in our test suite. The implementation targets many-core systems by using task parallelism with dynamic runtime scheduling. Numerical experiments show performance improvements over state-of-the-art sparse direct solvers. The implementation achieves high performance and good scalability on a range of modern shared memory parallel systems, including the Intel Xeon Phi (MIC). The code is part of a software package called STRUMPACK -- STRUctured Matrices PACKage, which also has a distributed memory component for dense rank-structured matrices
    • …
    corecore