107 research outputs found

    Linear solvers for power grid optimization problems: a review of GPU-accelerated linear solvers

    Full text link
    The linear equations that arise in interior methods for constrained optimization are sparse symmetric indefinite and become extremely ill-conditioned as the interior method converges. These linear systems present a challenge for existing solver frameworks based on sparse LU or LDL^T decompositions. We benchmark five well known direct linear solver packages using matrices extracted from power grid optimization problems. The achieved solution accuracy varies greatly among the packages. None of the tested packages delivers significant GPU acceleration for our test cases

    A comparative study of null-space factorizations for sparse symmetric saddle point systems

    Get PDF
    Null-space methods for solving saddle point systems of equations have long been used to transform an indefinite system into a symmetric positive definite one of smaller dimension. A number of independent works in the literature have identified that we can interpret a null-space method as a matrix factorization. We review these findings, highlight links between them, and bring them into a unified framework. We also investigate the suitability of using null-space factorizations to derive sparse direct methods, and present numerical results for both practical and academic problems

    On maximum volume submatrices and cross approximation for symmetric semidefinite and diagonally dominant matrices

    Full text link
    The problem of finding a kĂ—kk \times k submatrix of maximum volume of a matrix AA is of interest in a variety of applications. For example, it yields a quasi-best low-rank approximation constructed from the rows and columns of AA. We show that such a submatrix can always be chosen to be a principal submatrix if AA is symmetric semidefinite or diagonally dominant. Then we analyze the low-rank approximation error returned by a greedy method for volume maximization, cross approximation with complete pivoting. Our bound for general matrices extends an existing result for symmetric semidefinite matrices and yields new error estimates for diagonally dominant matrices. In particular, for doubly diagonally dominant matrices the error is shown to remain within a modest factor of the best approximation error. We also illustrate how the application of our results to cross approximation for functions leads to new and better convergence results

    A distributed-memory package for dense Hierarchically Semi-Separable matrix computations using randomization

    Full text link
    We present a distributed-memory library for computations with dense structured matrices. A matrix is considered structured if its off-diagonal blocks can be approximated by a rank-deficient matrix with low numerical rank. Here, we use Hierarchically Semi-Separable representations (HSS). Such matrices appear in many applications, e.g., finite element methods, boundary element methods, etc. Exploiting this structure allows for fast solution of linear systems and/or fast computation of matrix-vector products, which are the two main building blocks of matrix computations. The compression algorithm that we use, that computes the HSS form of an input dense matrix, relies on randomized sampling with a novel adaptive sampling mechanism. We discuss the parallelization of this algorithm and also present the parallelization of structured matrix-vector product, structured factorization and solution routines. The efficiency of the approach is demonstrated on large problems from different academic and industrial applications, on up to 8,000 cores. This work is part of a more global effort, the STRUMPACK (STRUctured Matrices PACKage) software package for computations with sparse and dense structured matrices. Hence, although useful on their own right, the routines also represent a step in the direction of a distributed-memory sparse solver

    Reducing Communication in the Solution of Linear Systems

    Get PDF
    There is a growing performance gap between computation and communication on modern computers, making it crucial to develop algorithms with lower latency and bandwidth requirements. Because systems of linear equations are important for numerous scientific and engineering applications, I have studied several approaches for reducing communication in those problems. First, I developed optimizations to dense LU with partial pivoting, which downstream applications can adopt with little to no effort. Second, I consider two techniques to completely replace pivoting in dense LU, which can provide significantly higher speedups, albeit without the same numerical guarantees as partial pivoting. One technique uses randomized preprocessing, while the other is a novel combination of block factorization and additive perturbation. Finally, I investigate using mixed precision in GMRES for solving sparse systems, which reduces the volume of data movement, and thus, the pressure on the memory bandwidth

    Parallel accelerated cyclic reduction preconditioner for three-dimensional elliptic PDEs with variable coefficients

    Full text link
    We present a robust and scalable preconditioner for the solution of large-scale linear systems that arise from the discretization of elliptic PDEs amenable to rank compression. The preconditioner is based on hierarchical low-rank approximations and the cyclic reduction method. The setup and application phases of the preconditioner achieve log-linear complexity in memory footprint and number of operations, and numerical experiments exhibit good weak and strong scalability at large processor counts in a distributed memory environment. Numerical experiments with linear systems that feature symmetry and nonsymmetry, definiteness and indefiniteness, constant and variable coefficients demonstrate the preconditioner applicability and robustness. Furthermore, it is possible to control the number of iterations via the accuracy threshold of the hierarchical matrix approximations and their arithmetic operations, and the tuning of the admissibility condition parameter. Together, these parameters allow for optimization of the memory requirements and performance of the preconditioner.Comment: 24 pages, Elsevier Journal of Computational and Applied Mathematics, Dec 201
    • …
    corecore