196,954 research outputs found
Minimizing Communication in Linear Algebra
In 1981 Hong and Kung proved a lower bound on the amount of communication
needed to perform dense, matrix-multiplication using the conventional
algorithm, where the input matrices were too large to fit in the small, fast
memory. In 2004 Irony, Toledo and Tiskin gave a new proof of this result and
extended it to the parallel case. In both cases the lower bound may be
expressed as (#arithmetic operations / ), where M is the size
of the fast memory (or local memory in the parallel case). Here we generalize
these results to a much wider variety of algorithms, including LU
factorization, Cholesky factorization, factorization, QR factorization,
algorithms for eigenvalues and singular values, i.e., essentially all direct
methods of linear algebra. The proof works for dense or sparse matrices, and
for sequential or parallel algorithms. In addition to lower bounds on the
amount of data moved (bandwidth) we get lower bounds on the number of messages
required to move it (latency). We illustrate how to extend our lower bound
technique to compositions of linear algebra operations (like computing powers
of a matrix), to decide whether it is enough to call a sequence of simpler
optimal algorithms (like matrix multiplication) to minimize communication, or
if we can do better. We give examples of both. We also show how to extend our
lower bounds to certain graph theoretic problems.
We point out recently designed algorithms for dense LU, Cholesky, QR,
eigenvalue and the SVD problems that attain these lower bounds; implementations
of LU and QR show large speedups over conventional linear algebra algorithms in
standard libraries like LAPACK and ScaLAPACK. Many open problems remain.Comment: 27 pages, 2 table
Fast Sparse Matrix Multiplication
Let A and B two n n matrices over a ring R (e.g., the reals or the integers) each containing at most m non-zero elements. We present a new algorithm that multiplies A and B using O(m ) algebraic operations (i.e., multiplications, additions and subtractions) over R. The naive matrix multiplication algorithm, on the other hand, may need to perform #(mn) operations to accomplish the same task. For , the new algorithm performs an almost optimal number of only n operations. For m the new algorithm is also faster than the best known matrix multiplication algorithm for dense matrices which uses O(n ) algebraic operations. The new algorithm is obtained using a surprisingly straightforward combination of a simple combinatorial idea and existing fast rectangular matrix multiplication algorithms. We also obtain improved algorithms for the multiplication of more than two sparse matrices
Computing minimal interpolation bases
International audienceWe consider the problem of computing univariate polynomial matrices over afield that represent minimal solution bases for a general interpolationproblem, some forms of which are the vector M-Pad\'e approximation problem in[Van Barel and Bultheel, Numerical Algorithms 3, 1992] and the rationalinterpolation problem in [Beckermann and Labahn, SIAM J. Matrix Anal. Appl. 22,2000]. Particular instances of this problem include the bivariate interpolationsteps of Guruswami-Sudan hard-decision and K\"otter-Vardy soft-decisiondecodings of Reed-Solomon codes, the multivariate interpolation step oflist-decoding of folded Reed-Solomon codes, and Hermite-Pad\'e approximation. In the mentioned references, the problem is solved using iterative algorithmsbased on recurrence relations. Here, we discuss a fast, divide-and-conquerversion of this recurrence, taking advantage of fast matrix computations overthe scalars and over the polynomials. This new algorithm is deterministic, andfor computing shifted minimal bases of relations between vectors of size it uses field operations, where is the exponent of matrix multiplication, and is the sum of theentries of the input shift , with . This complexity boundimproves in particular on earlier algorithms in the case of bivariateinterpolation for soft decoding, while matching fastest existing algorithms forsimultaneous Hermite-Pad\'e approximation
An elementary algorithm for computing the determinant of pentadiagonal Toeplitz matrices
AbstractOver the last 25 years, various fast algorithms for computing the determinant of a pentadiagonal Toeplitz matrices were developed. In this paper, we give a new kind of elementary algorithm requiring 56⋅⌊n−4k⌋+30k+O(logn) operations, where k≥4 is an integer that needs to be chosen freely at the beginning of the algorithm. For example, we can compute det(Tn) in n+O(logn) and 82n+O(logn) operations if we choose k as 56 and ⌊2815(n−4)⌋, respectively. For various applications, it will be enough to test if the determinant of a pentadiagonal Toeplitz matrix is zero or not. As in another result of this paper, we used modular arithmetic to give a fast algorithm determining when determinants of such matrices are non-zero. This second algorithm works only for Toeplitz matrices with rational entries
- …