196,954 research outputs found

    Minimizing Communication in Linear Algebra

    Full text link
    In 1981 Hong and Kung proved a lower bound on the amount of communication needed to perform dense, matrix-multiplication using the conventional O(n3)O(n^3) algorithm, where the input matrices were too large to fit in the small, fast memory. In 2004 Irony, Toledo and Tiskin gave a new proof of this result and extended it to the parallel case. In both cases the lower bound may be expressed as Ω\Omega(#arithmetic operations / M\sqrt{M}), where M is the size of the fast memory (or local memory in the parallel case). Here we generalize these results to a much wider variety of algorithms, including LU factorization, Cholesky factorization, LDLTLDL^T factorization, QR factorization, algorithms for eigenvalues and singular values, i.e., essentially all direct methods of linear algebra. The proof works for dense or sparse matrices, and for sequential or parallel algorithms. In addition to lower bounds on the amount of data moved (bandwidth) we get lower bounds on the number of messages required to move it (latency). We illustrate how to extend our lower bound technique to compositions of linear algebra operations (like computing powers of a matrix), to decide whether it is enough to call a sequence of simpler optimal algorithms (like matrix multiplication) to minimize communication, or if we can do better. We give examples of both. We also show how to extend our lower bounds to certain graph theoretic problems. We point out recently designed algorithms for dense LU, Cholesky, QR, eigenvalue and the SVD problems that attain these lower bounds; implementations of LU and QR show large speedups over conventional linear algebra algorithms in standard libraries like LAPACK and ScaLAPACK. Many open problems remain.Comment: 27 pages, 2 table

    Fast Sparse Matrix Multiplication

    Full text link
    Let A and B two n n matrices over a ring R (e.g., the reals or the integers) each containing at most m non-zero elements. We present a new algorithm that multiplies A and B using O(m ) algebraic operations (i.e., multiplications, additions and subtractions) over R. The naive matrix multiplication algorithm, on the other hand, may need to perform #(mn) operations to accomplish the same task. For , the new algorithm performs an almost optimal number of only n operations. For m the new algorithm is also faster than the best known matrix multiplication algorithm for dense matrices which uses O(n ) algebraic operations. The new algorithm is obtained using a surprisingly straightforward combination of a simple combinatorial idea and existing fast rectangular matrix multiplication algorithms. We also obtain improved algorithms for the multiplication of more than two sparse matrices

    Computing minimal interpolation bases

    Get PDF
    International audienceWe consider the problem of computing univariate polynomial matrices over afield that represent minimal solution bases for a general interpolationproblem, some forms of which are the vector M-Pad\'e approximation problem in[Van Barel and Bultheel, Numerical Algorithms 3, 1992] and the rationalinterpolation problem in [Beckermann and Labahn, SIAM J. Matrix Anal. Appl. 22,2000]. Particular instances of this problem include the bivariate interpolationsteps of Guruswami-Sudan hard-decision and K\"otter-Vardy soft-decisiondecodings of Reed-Solomon codes, the multivariate interpolation step oflist-decoding of folded Reed-Solomon codes, and Hermite-Pad\'e approximation. In the mentioned references, the problem is solved using iterative algorithmsbased on recurrence relations. Here, we discuss a fast, divide-and-conquerversion of this recurrence, taking advantage of fast matrix computations overthe scalars and over the polynomials. This new algorithm is deterministic, andfor computing shifted minimal bases of relations between mm vectors of sizeσ\sigma it uses O (mω1(σ+s))O~( m^{\omega-1} (\sigma + |s|) ) field operations, whereω\omega is the exponent of matrix multiplication, and s|s| is the sum of theentries of the input shift ss, with min(s)=0\min(s) = 0. This complexity boundimproves in particular on earlier algorithms in the case of bivariateinterpolation for soft decoding, while matching fastest existing algorithms forsimultaneous Hermite-Pad\'e approximation

    An elementary algorithm for computing the determinant of pentadiagonal Toeplitz matrices

    Get PDF
    AbstractOver the last 25 years, various fast algorithms for computing the determinant of a pentadiagonal Toeplitz matrices were developed. In this paper, we give a new kind of elementary algorithm requiring 56⋅⌊n−4k⌋+30k+O(logn) operations, where k≥4 is an integer that needs to be chosen freely at the beginning of the algorithm. For example, we can compute det(Tn) in n+O(logn) and 82n+O(logn) operations if we choose k as 56 and ⌊2815(n−4)⌋, respectively. For various applications, it will be enough to test if the determinant of a pentadiagonal Toeplitz matrix is zero or not. As in another result of this paper, we used modular arithmetic to give a fast algorithm determining when determinants of such matrices are non-zero. This second algorithm works only for Toeplitz matrices with rational entries
    corecore