103 research outputs found
Minimizing Communication for Eigenproblems and the Singular Value Decomposition
Algorithms have two costs: arithmetic and communication. The latter
represents the cost of moving data, either between levels of a memory
hierarchy, or between processors over a network. Communication often dominates
arithmetic and represents a rapidly increasing proportion of the total cost, so
we seek algorithms that minimize communication. In \cite{BDHS10} lower bounds
were presented on the amount of communication required for essentially all
-like algorithms for linear algebra, including eigenvalue problems and
the SVD. Conventional algorithms, including those currently implemented in
(Sca)LAPACK, perform asymptotically more communication than these lower bounds
require. In this paper we present parallel and sequential eigenvalue algorithms
(for pencils, nonsymmetric matrices, and symmetric matrices) and SVD algorithms
that do attain these lower bounds, and analyze their convergence and
communication costs.Comment: 43 pages, 11 figure
Communication-optimal Parallel and Sequential Cholesky Decomposition
Numerical algorithms have two kinds of costs: arithmetic and communication,
by which we mean either moving data between levels of a memory hierarchy (in
the sequential case) or over a network connecting processors (in the parallel
case). Communication costs often dominate arithmetic costs, so it is of
interest to design algorithms minimizing communication. In this paper we first
extend known lower bounds on the communication cost (both for bandwidth and for
latency) of conventional (O(n^3)) matrix multiplication to Cholesky
factorization, which is used for solving dense symmetric positive definite
linear systems. Second, we compare the costs of various Cholesky decomposition
implementations to these lower bounds and identify the algorithms and data
structures that attain them. In the sequential case, we consider both the
two-level and hierarchical memory models. Combined with prior results in [13,
14, 15], this gives a set of communication-optimal algorithms for O(n^3)
implementations of the three basic factorizations of dense linear algebra: LU
with pivoting, QR and Cholesky. But it goes beyond this prior work on
sequential LU by optimizing communication for any number of levels of memory
hierarchy.Comment: 29 pages, 2 tables, 6 figure
A 3D Parallel Algorithm for QR Decomposition
Interprocessor communication often dominates the runtime of large matrix
computations. We present a parallel algorithm for computing QR decompositions
whose bandwidth cost (communication volume) can be decreased at the cost of
increasing its latency cost (number of messages). By varying a parameter to
navigate the bandwidth/latency tradeoff, we can tune this algorithm for
machines with different communication costs
Faster all-pairs shortest paths via circuit complexity
We present a new randomized method for computing the min-plus product
(a.k.a., tropical product) of two matrices, yielding a faster
algorithm for solving the all-pairs shortest path problem (APSP) in dense
-node directed graphs with arbitrary edge weights. On the real RAM, where
additions and comparisons of reals are unit cost (but all other operations have
typical logarithmic cost), the algorithm runs in time
and is correct with high probability.
On the word RAM, the algorithm runs in time for edge weights in . Prior algorithms used either time for
various , or time for various
and .
The new algorithm applies a tool from circuit complexity, namely the
Razborov-Smolensky polynomials for approximately representing
circuits, to efficiently reduce a matrix product over the algebra to
a relatively small number of rectangular matrix products over ,
each of which are computable using a particularly efficient method due to
Coppersmith. We also give a deterministic version of the algorithm running in
time for some , which utilizes the
Yao-Beigel-Tarui translation of circuits into "nice" depth-two
circuits.Comment: 24 pages. Updated version now has slightly faster running time. To
appear in ACM Symposium on Theory of Computing (STOC), 201
- …