447 research outputs found
Minimizing Communication for Eigenproblems and the Singular Value Decomposition
Algorithms have two costs: arithmetic and communication. The latter
represents the cost of moving data, either between levels of a memory
hierarchy, or between processors over a network. Communication often dominates
arithmetic and represents a rapidly increasing proportion of the total cost, so
we seek algorithms that minimize communication. In \cite{BDHS10} lower bounds
were presented on the amount of communication required for essentially all
-like algorithms for linear algebra, including eigenvalue problems and
the SVD. Conventional algorithms, including those currently implemented in
(Sca)LAPACK, perform asymptotically more communication than these lower bounds
require. In this paper we present parallel and sequential eigenvalue algorithms
(for pencils, nonsymmetric matrices, and symmetric matrices) and SVD algorithms
that do attain these lower bounds, and analyze their convergence and
communication costs.Comment: 43 pages, 11 figure
Accurate and Efficient Expression Evaluation and Linear Algebra
We survey and unify recent results on the existence of accurate algorithms
for evaluating multivariate polynomials, and more generally for accurate
numerical linear algebra with structured matrices. By "accurate" we mean that
the computed answer has relative error less than 1, i.e., has some correct
leading digits. We also address efficiency, by which we mean algorithms that
run in polynomial time in the size of the input. Our results will depend
strongly on the model of arithmetic: Most of our results will use the so-called
Traditional Model (TM). We give a set of necessary and sufficient conditions to
decide whether a high accuracy algorithm exists in the TM, and describe
progress toward a decision procedure that will take any problem and provide
either a high accuracy algorithm or a proof that none exists. When no accurate
algorithm exists in the TM, it is natural to extend the set of available
accurate operations by a library of additional operations, such as , dot
products, or indeed any enumerable set which could then be used to build
further accurate algorithms. We show how our accurate algorithms and decision
procedure for finding them extend to this case. Finally, we address other
models of arithmetic, and the relationship between (im)possibility in the TM
and (in)efficient algorithms operating on numbers represented as bit strings.Comment: 49 pages, 6 figures, 1 tabl
Minimizing Communication in Linear Algebra
In 1981 Hong and Kung proved a lower bound on the amount of communication
needed to perform dense, matrix-multiplication using the conventional
algorithm, where the input matrices were too large to fit in the small, fast
memory. In 2004 Irony, Toledo and Tiskin gave a new proof of this result and
extended it to the parallel case. In both cases the lower bound may be
expressed as (#arithmetic operations / ), where M is the size
of the fast memory (or local memory in the parallel case). Here we generalize
these results to a much wider variety of algorithms, including LU
factorization, Cholesky factorization, factorization, QR factorization,
algorithms for eigenvalues and singular values, i.e., essentially all direct
methods of linear algebra. The proof works for dense or sparse matrices, and
for sequential or parallel algorithms. In addition to lower bounds on the
amount of data moved (bandwidth) we get lower bounds on the number of messages
required to move it (latency). We illustrate how to extend our lower bound
technique to compositions of linear algebra operations (like computing powers
of a matrix), to decide whether it is enough to call a sequence of simpler
optimal algorithms (like matrix multiplication) to minimize communication, or
if we can do better. We give examples of both. We also show how to extend our
lower bounds to certain graph theoretic problems.
We point out recently designed algorithms for dense LU, Cholesky, QR,
eigenvalue and the SVD problems that attain these lower bounds; implementations
of LU and QR show large speedups over conventional linear algebra algorithms in
standard libraries like LAPACK and ScaLAPACK. Many open problems remain.Comment: 27 pages, 2 table
Computation- and Space-Efficient Implementation of SSA
The computational complexity of different steps of the basic SSA is
discussed. It is shown that the use of the general-purpose "blackbox" routines
(e.g. found in packages like LAPACK) leads to huge waste of time resources
since the special Hankel structure of the trajectory matrix is not taken into
account. We outline several state-of-the-art algorithms (for example,
Lanczos-based truncated SVD) which can be modified to exploit the structure of
the trajectory matrix. The key components here are hankel matrix-vector
multiplication and hankelization operator. We show that both can be computed
efficiently by the means of Fast Fourier Transform. The use of these methods
yields the reduction of the worst-case computational complexity from O(N^3) to
O(k N log(N)), where N is series length and k is the number of eigentriples
desired.Comment: 27 pages, 8 figure
- …