349 research outputs found

    Communication-optimal Parallel and Sequential Cholesky Decomposition

    Full text link
    Numerical algorithms have two kinds of costs: arithmetic and communication, by which we mean either moving data between levels of a memory hierarchy (in the sequential case) or over a network connecting processors (in the parallel case). Communication costs often dominate arithmetic costs, so it is of interest to design algorithms minimizing communication. In this paper we first extend known lower bounds on the communication cost (both for bandwidth and for latency) of conventional (O(n^3)) matrix multiplication to Cholesky factorization, which is used for solving dense symmetric positive definite linear systems. Second, we compare the costs of various Cholesky decomposition implementations to these lower bounds and identify the algorithms and data structures that attain them. In the sequential case, we consider both the two-level and hierarchical memory models. Combined with prior results in [13, 14, 15], this gives a set of communication-optimal algorithms for O(n^3) implementations of the three basic factorizations of dense linear algebra: LU with pivoting, QR and Cholesky. But it goes beyond this prior work on sequential LU by optimizing communication for any number of levels of memory hierarchy.Comment: 29 pages, 2 tables, 6 figure

    Minimizing Communication for Eigenproblems and the Singular Value Decomposition

    Full text link
    Algorithms have two costs: arithmetic and communication. The latter represents the cost of moving data, either between levels of a memory hierarchy, or between processors over a network. Communication often dominates arithmetic and represents a rapidly increasing proportion of the total cost, so we seek algorithms that minimize communication. In \cite{BDHS10} lower bounds were presented on the amount of communication required for essentially all O(n3)O(n^3)-like algorithms for linear algebra, including eigenvalue problems and the SVD. Conventional algorithms, including those currently implemented in (Sca)LAPACK, perform asymptotically more communication than these lower bounds require. In this paper we present parallel and sequential eigenvalue algorithms (for pencils, nonsymmetric matrices, and symmetric matrices) and SVD algorithms that do attain these lower bounds, and analyze their convergence and communication costs.Comment: 43 pages, 11 figure

    Toward accurate polynomial evaluation in rounded arithmetic

    Get PDF
    Given a multivariate real (or complex) polynomial pp and a domain D\cal D, we would like to decide whether an algorithm exists to evaluate p(x)p(x) accurately for all x∈Dx \in {\cal D} using rounded real (or complex) arithmetic. Here ``accurately'' means with relative error less than 1, i.e., with some correct leading digits. The answer depends on the model of rounded arithmetic: We assume that for any arithmetic operator op(a,b)op(a,b), for example a+ba+b or a⋅ba \cdot b, its computed value is op(a,b)⋅(1+δ)op(a,b) \cdot (1 + \delta), where ∣δ∣| \delta | is bounded by some constant ϵ\epsilon where 0<ϵ≪10 < \epsilon \ll 1, but δ\delta is otherwise arbitrary. This model is the traditional one used to analyze the accuracy of floating point algorithms.Our ultimate goal is to establish a decision procedure that, for any pp and D\cal D, either exhibits an accurate algorithm or proves that none exists. In contrast to the case where numbers are stored and manipulated as finite bit strings (e.g., as floating point numbers or rational numbers) we show that some polynomials pp are impossible to evaluate accurately. The existence of an accurate algorithm will depend not just on pp and D\cal D, but on which arithmetic operators and which constants are are available and whether branching is permitted. Toward this goal, we present necessary conditions on pp for it to be accurately evaluable on open real or complex domains D{\cal D}. We also give sufficient conditions, and describe progress toward a complete decision procedure. We do present a complete decision procedure for homogeneous polynomials pp with integer coefficients, {\cal D} = \C^n, and using only the arithmetic operations ++, −- and ⋅\cdot.Comment: 54 pages, 6 figures; refereed version; to appear in Foundations of Computational Mathematics: Santander 2005, Cambridge University Press, March 200

    Accurate and Efficient Expression Evaluation and Linear Algebra

    Full text link
    We survey and unify recent results on the existence of accurate algorithms for evaluating multivariate polynomials, and more generally for accurate numerical linear algebra with structured matrices. By "accurate" we mean that the computed answer has relative error less than 1, i.e., has some correct leading digits. We also address efficiency, by which we mean algorithms that run in polynomial time in the size of the input. Our results will depend strongly on the model of arithmetic: Most of our results will use the so-called Traditional Model (TM). We give a set of necessary and sufficient conditions to decide whether a high accuracy algorithm exists in the TM, and describe progress toward a decision procedure that will take any problem and provide either a high accuracy algorithm or a proof that none exists. When no accurate algorithm exists in the TM, it is natural to extend the set of available accurate operations by a library of additional operations, such as x+y+zx+y+z, dot products, or indeed any enumerable set which could then be used to build further accurate algorithms. We show how our accurate algorithms and decision procedure for finding them extend to this case. Finally, we address other models of arithmetic, and the relationship between (im)possibility in the TM and (in)efficient algorithms operating on numbers represented as bit strings.Comment: 49 pages, 6 figures, 1 tabl

    Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures

    Full text link
    The QR factorization and the SVD are two fundamental matrix decompositions with applications throughout scientific computing and data analysis. For matrices with many more rows than columns, so-called "tall-and-skinny matrices," there is a numerically stable, efficient, communication-avoiding algorithm for computing the QR factorization. It has been used in traditional high performance computing and grid computing environments. For MapReduce environments, existing methods to compute the QR decomposition use a numerically unstable approach that relies on indirectly computing the Q factor. In the best case, these methods require only two passes over the data. In this paper, we describe how to compute a stable tall-and-skinny QR factorization on a MapReduce architecture in only slightly more than 2 passes over the data. We can compute the SVD with only a small change and no difference in performance. We present a performance comparison between our new direct TSQR method, a standard unstable implementation for MapReduce (Cholesky QR), and the classic stable algorithm implemented for MapReduce (Householder QR). We find that our new stable method has a large performance advantage over the Householder QR method. This holds both in a theoretical performance model as well as in an actual implementation

    Computing stable eigendecompositions of matrices

    Get PDF
    AbstractIf a matrix T is known only to within a tolerance ϵ (because of measurement or roundoff errors), then it may be difficult to compute an eigendecomposition of T, since its invariant subspaces are discontinuous functions of its entries. In this paper we show how to compute a stable decomposition of an uncertain matrix T which varies continuously and boundedly as T varies in a ball of radius ϵ

    LU factorization with panel rank revealing pivoting and its communication avoiding version

    Get PDF
    We present the LU decomposition with panel rank revealing pivoting (LU_PRRP), an LU factorization algorithm based on strong rank revealing QR panel factorization. LU_PRRP is more stable than Gaussian elimination with partial pivoting (GEPP). Our extensive numerical experiments show that the new factorization scheme is as numerically stable as GEPP in practice, but it is more resistant to pathological cases and easily solves the Wilkinson matrix and the Foster matrix. We also present CALU_PRRP, a communication avoiding version of LU_PRRP that minimizes communication. CALU_PRRP is based on tournament pivoting, with the selection of the pivots at each step of the tournament being performed via strong rank revealing QR factorization. CALU_PRRP is more stable than CALU, the communication avoiding version of GEPP. CALU_PRRP is also more stable in practice and is resistant to pathological cases on which GEPP and CALU fail.Comment: No. RR-7867 (2012
    • …
    corecore