13 research outputs found

    Improving the numerical stability of fast matrix multiplication

    Full text link
    Fast algorithms for matrix multiplication, namely those that perform asymptotically fewer scalar operations than the classical algorithm, have been considered primarily of theoretical interest. Apart from Strassen's original algorithm, few fast algorithms have been efficiently implemented or used in practical applications. However, there exist many practical alternatives to Strassen's algorithm with varying performance and numerical properties. Fast algorithms are known to be numerically stable, but because their error bounds are slightly weaker than the classical algorithm, they are not used even in cases where they provide a performance benefit. We argue in this paper that the numerical sacrifice of fast algorithms, particularly for the typical use cases of practical algorithms, is not prohibitive, and we explore ways to improve the accuracy both theoretically and empirically. The numerical accuracy of fast matrix multiplication depends on properties of the algorithm and of the input matrices, and we consider both contributions independently. We generalize and tighten previous error analyses of fast algorithms and compare their properties. We discuss algorithmic techniques for improving the error guarantees from two perspectives: manipulating the algorithms, and reducing input anomalies by various forms of diagonal scaling. Finally, we benchmark performance and demonstrate our improved numerical accuracy

    Factoring matrices with a tree-structured sparsity pattern

    Get PDF
    Let A be a matrix whose sparsity pattern is a tree with maximal degree dmax. We show that if the columns of A are ordered using minimum degree on A + A ∗ , then factoring A using a sparse LU with partial pivoting algorithm generates only O(dmaxn) fill, requires only O(dmaxn) operations, and is much more stable than LU with partial pivoting on a general matrix. We also propose an even more efficient and just-as-stable algorithm called sibling-dominant pivoting. This algorithm is a strict partial pivoting algorithm that modifies the column preordering locally to minimize fill and work. It leads to only O(n) work and fill. More conventional column pre-ordering methods that are based (usually implicitly) on the sparsity pattern of A∗A are not as efficient as the approaches that we propose in this paper

    Tuning the Coarse Space Construction in a Spectral AMG Solver

    Get PDF
    In this paper, we discuss strategies for computing subsets of eigenvectors of matrices corresponding to subdomains of finite element meshes achieving compromise between two contradicting goals. The subset of eigenvectors is required in the construction of coarse spaces used in algebraic multigrid methods (AMG) as well as in certain domain decomposition (DD) methods. The quality of the coarse spaces depends on the number of eigenvectors, which improves the approximation properties of the coarse space and impacts the overall performance and convergence of the associated AMG or DD algorithms. However, a large number of eigenvectors affects negatively the sparsity of the corresponding coarse matrices, which can become fairly dense. The sparsity of the coarse matrices can be controlled to a certain extent by the size of the subdomains (union of finite elements) referred to as agglomerates. If the size of the agglomerates is too large, then the cost of the eigensolvers increases and eventually can become unacceptable for the purpose of constructing the AMG or DD solvers. This paper investigates strategies to optimize the solution of the partial eigenproblems of interest. In particular, we examine direct and iterative eigensolvers for computing those subsets. Our experiments with a well-known model of an oil-reservoir simulation benchmark indicate that iterative eigensolvers can lead to significant improvements in the overall performance of an AMG solver that exploits such spectral construction of coarse spaces

    a

    No full text
    a blocked Aasen’s algorithm wit
    corecore