349 research outputs found
Communication-optimal Parallel and Sequential Cholesky Decomposition
Numerical algorithms have two kinds of costs: arithmetic and communication,
by which we mean either moving data between levels of a memory hierarchy (in
the sequential case) or over a network connecting processors (in the parallel
case). Communication costs often dominate arithmetic costs, so it is of
interest to design algorithms minimizing communication. In this paper we first
extend known lower bounds on the communication cost (both for bandwidth and for
latency) of conventional (O(n^3)) matrix multiplication to Cholesky
factorization, which is used for solving dense symmetric positive definite
linear systems. Second, we compare the costs of various Cholesky decomposition
implementations to these lower bounds and identify the algorithms and data
structures that attain them. In the sequential case, we consider both the
two-level and hierarchical memory models. Combined with prior results in [13,
14, 15], this gives a set of communication-optimal algorithms for O(n^3)
implementations of the three basic factorizations of dense linear algebra: LU
with pivoting, QR and Cholesky. But it goes beyond this prior work on
sequential LU by optimizing communication for any number of levels of memory
hierarchy.Comment: 29 pages, 2 tables, 6 figure
Minimizing Communication for Eigenproblems and the Singular Value Decomposition
Algorithms have two costs: arithmetic and communication. The latter
represents the cost of moving data, either between levels of a memory
hierarchy, or between processors over a network. Communication often dominates
arithmetic and represents a rapidly increasing proportion of the total cost, so
we seek algorithms that minimize communication. In \cite{BDHS10} lower bounds
were presented on the amount of communication required for essentially all
-like algorithms for linear algebra, including eigenvalue problems and
the SVD. Conventional algorithms, including those currently implemented in
(Sca)LAPACK, perform asymptotically more communication than these lower bounds
require. In this paper we present parallel and sequential eigenvalue algorithms
(for pencils, nonsymmetric matrices, and symmetric matrices) and SVD algorithms
that do attain these lower bounds, and analyze their convergence and
communication costs.Comment: 43 pages, 11 figure
Toward accurate polynomial evaluation in rounded arithmetic
Given a multivariate real (or complex) polynomial and a domain ,
we would like to decide whether an algorithm exists to evaluate
accurately for all using rounded real (or complex) arithmetic.
Here ``accurately'' means with relative error less than 1, i.e., with some
correct leading digits. The answer depends on the model of rounded arithmetic:
We assume that for any arithmetic operator , for example or , its computed value is , where is bounded by some constant where , but
is otherwise arbitrary. This model is the traditional one used to
analyze the accuracy of floating point algorithms.Our ultimate goal is to
establish a decision procedure that, for any and , either exhibits
an accurate algorithm or proves that none exists. In contrast to the case where
numbers are stored and manipulated as finite bit strings (e.g., as floating
point numbers or rational numbers) we show that some polynomials are
impossible to evaluate accurately. The existence of an accurate algorithm will
depend not just on and , but on which arithmetic operators and
which constants are are available and whether branching is permitted. Toward
this goal, we present necessary conditions on for it to be accurately
evaluable on open real or complex domains . We also give sufficient
conditions, and describe progress toward a complete decision procedure. We do
present a complete decision procedure for homogeneous polynomials with
integer coefficients, {\cal D} = \C^n, and using only the arithmetic
operations , and .Comment: 54 pages, 6 figures; refereed version; to appear in Foundations of
Computational Mathematics: Santander 2005, Cambridge University Press, March
200
Accurate and Efficient Expression Evaluation and Linear Algebra
We survey and unify recent results on the existence of accurate algorithms
for evaluating multivariate polynomials, and more generally for accurate
numerical linear algebra with structured matrices. By "accurate" we mean that
the computed answer has relative error less than 1, i.e., has some correct
leading digits. We also address efficiency, by which we mean algorithms that
run in polynomial time in the size of the input. Our results will depend
strongly on the model of arithmetic: Most of our results will use the so-called
Traditional Model (TM). We give a set of necessary and sufficient conditions to
decide whether a high accuracy algorithm exists in the TM, and describe
progress toward a decision procedure that will take any problem and provide
either a high accuracy algorithm or a proof that none exists. When no accurate
algorithm exists in the TM, it is natural to extend the set of available
accurate operations by a library of additional operations, such as , dot
products, or indeed any enumerable set which could then be used to build
further accurate algorithms. We show how our accurate algorithms and decision
procedure for finding them extend to this case. Finally, we address other
models of arithmetic, and the relationship between (im)possibility in the TM
and (in)efficient algorithms operating on numbers represented as bit strings.Comment: 49 pages, 6 figures, 1 tabl
Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures
The QR factorization and the SVD are two fundamental matrix decompositions
with applications throughout scientific computing and data analysis. For
matrices with many more rows than columns, so-called "tall-and-skinny
matrices," there is a numerically stable, efficient, communication-avoiding
algorithm for computing the QR factorization. It has been used in traditional
high performance computing and grid computing environments. For MapReduce
environments, existing methods to compute the QR decomposition use a
numerically unstable approach that relies on indirectly computing the Q factor.
In the best case, these methods require only two passes over the data. In this
paper, we describe how to compute a stable tall-and-skinny QR factorization on
a MapReduce architecture in only slightly more than 2 passes over the data. We
can compute the SVD with only a small change and no difference in performance.
We present a performance comparison between our new direct TSQR method, a
standard unstable implementation for MapReduce (Cholesky QR), and the classic
stable algorithm implemented for MapReduce (Householder QR). We find that our
new stable method has a large performance advantage over the Householder QR
method. This holds both in a theoretical performance model as well as in an
actual implementation
Computing stable eigendecompositions of matrices
AbstractIf a matrix T is known only to within a tolerance ϵ (because of measurement or roundoff errors), then it may be difficult to compute an eigendecomposition of T, since its invariant subspaces are discontinuous functions of its entries. In this paper we show how to compute a stable decomposition of an uncertain matrix T which varies continuously and boundedly as T varies in a ball of radius ϵ
LU factorization with panel rank revealing pivoting and its communication avoiding version
We present the LU decomposition with panel rank revealing pivoting (LU_PRRP),
an LU factorization algorithm based on strong rank revealing QR panel
factorization. LU_PRRP is more stable than Gaussian elimination with partial
pivoting (GEPP). Our extensive numerical experiments show that the new
factorization scheme is as numerically stable as GEPP in practice, but it is
more resistant to pathological cases and easily solves the Wilkinson matrix and
the Foster matrix. We also present CALU_PRRP, a communication avoiding version
of LU_PRRP that minimizes communication. CALU_PRRP is based on tournament
pivoting, with the selection of the pivots at each step of the tournament being
performed via strong rank revealing QR factorization. CALU_PRRP is more stable
than CALU, the communication avoiding version of GEPP. CALU_PRRP is also more
stable in practice and is resistant to pathological cases on which GEPP and
CALU fail.Comment: No. RR-7867 (2012
- …