28,105 research outputs found

    Fast linear algebra is stable

    Full text link
    In an earlier paper, we showed that a large class of fast recursive matrix multiplication algorithms is stable in a normwise sense, and that in fact if multiplication of nn-by-nn matrices can be done by any algorithm in O(nω+η)O(n^{\omega + \eta}) operations for any η>0\eta > 0, then it can be done stably in O(nω+η)O(n^{\omega + \eta}) operations for any η>0\eta > 0. Here we extend this result to show that essentially all standard linear algebra operations, including LU decomposition, QR decomposition, linear equation solving, matrix inversion, solving least squares problems, (generalized) eigenvalue problems and the singular value decomposition can also be done stably (in a normwise sense) in O(nω+η)O(n^{\omega + \eta}) operations.Comment: 26 pages; final version; to appear in Numerische Mathemati

    Novel Monte Carlo Methods for Large-Scale Linear Algebra Operations

    Get PDF
    Linear algebra operations play an important role in scientific computing and data analysis. With increasing data volume and complexity in the Big Data era, linear algebra operations are important tools to process massive datasets. On one hand, the advent of modern high-performance computing architectures with increasing computing power has greatly enhanced our capability to deal with a large volume of data. One the other hand, many classical, deterministic numerical linear algebra algorithms have difficulty to scale to handle large data sets. Monte Carlo methods, which are based on statistical sampling, exhibit many attractive properties in dealing with large volume of datasets, including fast approximated results, memory efficiency, reduced data accesses, natural parallelism, and inherent fault tolerance. In this dissertation, we present new Monte Carlo methods to accommodate a set of fundamental and ubiquitous large-scale linear algebra operations, including solving large-scale linear systems, constructing low-rank matrix approximation, and approximating the extreme eigenvalues/ eigenvectors, across modern distributed and parallel computing architectures. First of all, we revisit the classical Ulam-von Neumann Monte Carlo algorithm and derive the necessary and sufficient condition for its convergence. To support a broad family of linear systems, we develop Krylov subspace Monte Carlo solvers that go beyond the use of Neumann series. New algorithms used in the Krylov subspace Monte Carlo solvers include (1) a Breakdown-Free Block Conjugate Gradient algorithm to address the potential rank deficiency problem occurred in block Krylov subspace methods; (2) a Block Conjugate Gradient for Least Squares algorithm to stably approximate the least squares solutions of general linear systems; (3) a BCGLS algorithm with deflation to gain convergence acceleration; and (4) a Monte Carlo Generalized Minimal Residual algorithm based on sampling matrix-vector products to provide fast approximation of solutions. Secondly, we design a rank-revealing randomized Singular Value Decomposition (R3SVD) algorithm for adaptively constructing low-rank matrix approximations to satisfy application-specific accuracy. Thirdly, we study the block power method on Markov Chain Monte Carlo transition matrices and find that the convergence is actually depending on the number of independent vectors in the block. Correspondingly, we develop a sliding window power method to find stationary distribution, which has demonstrated success in modeling stochastic luminal Calcium release site. Fourthly, we take advantage of hybrid CPU-GPU computing platforms to accelerate the performance of the Breakdown-Free Block Conjugate Gradient algorithm and the randomized Singular Value Decomposition algorithm. Finally, we design a Gaussian variant of Freivalds’ algorithm to efficiently verify the correctness of matrix-matrix multiplication while avoiding undetectable fault patterns encountered in deterministic algorithms

    Minimizing Communication for Eigenproblems and the Singular Value Decomposition

    Full text link
    Algorithms have two costs: arithmetic and communication. The latter represents the cost of moving data, either between levels of a memory hierarchy, or between processors over a network. Communication often dominates arithmetic and represents a rapidly increasing proportion of the total cost, so we seek algorithms that minimize communication. In \cite{BDHS10} lower bounds were presented on the amount of communication required for essentially all O(n3)O(n^3)-like algorithms for linear algebra, including eigenvalue problems and the SVD. Conventional algorithms, including those currently implemented in (Sca)LAPACK, perform asymptotically more communication than these lower bounds require. In this paper we present parallel and sequential eigenvalue algorithms (for pencils, nonsymmetric matrices, and symmetric matrices) and SVD algorithms that do attain these lower bounds, and analyze their convergence and communication costs.Comment: 43 pages, 11 figure

    A Backward Stable Algorithm for Computing the CS Decomposition via the Polar Decomposition

    Full text link
    We introduce a backward stable algorithm for computing the CS decomposition of a partitioned 2n×n2n \times n matrix with orthonormal columns, or a rank-deficient partial isometry. The algorithm computes two n×nn \times n polar decompositions (which can be carried out in parallel) followed by an eigendecomposition of a judiciously crafted n×nn \times n Hermitian matrix. We prove that the algorithm is backward stable whenever the aforementioned decompositions are computed in a backward stable way. Since the polar decomposition and the symmetric eigendecomposition are highly amenable to parallelization, the algorithm inherits this feature. We illustrate this fact by invoking recently developed algorithms for the polar decomposition and symmetric eigendecomposition that leverage Zolotarev's best rational approximations of the sign function. Numerical examples demonstrate that the resulting algorithm for computing the CS decomposition enjoys excellent numerical stability
    corecore