25,787 research outputs found
Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions
Low-rank matrix approximations, such as the truncated singular value decomposition and the rank-revealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for performing low-rank matrix approximation. These techniques exploit modern computational architectures more fully than classical methods and open the possibility of dealing with truly massive data sets. This paper presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions. These methods use random sampling to identify a subspace that captures most of the action of a matrix. The input matrix is then compressed—either explicitly or
implicitly—to this subspace, and the reduced matrix is manipulated deterministically to obtain the desired low-rank factorization. In many cases, this approach beats its classical competitors in terms of accuracy, robustness, and/or speed. These claims are supported by extensive numerical experiments and a detailed error analysis. The specific benefits of randomized techniques depend on the computational environment. Consider the model problem of finding the k dominant components of the singular value decomposition of an m × n matrix. (i) For a dense input matrix, randomized algorithms require O(mn log(k))
floating-point operations (flops) in contrast to O(mnk) for classical algorithms. (ii) For a sparse input matrix, the flop count matches classical Krylov subspace methods, but the randomized approach is more robust and can easily be reorganized to exploit multiprocessor architectures. (iii) For a matrix that is too large to fit in fast memory, the randomized techniques require only a constant number of passes over the data, as opposed to O(k) passes for classical algorithms. In fact, it is sometimes possible to perform matrix approximation with a single pass over the data
Revisiting the Nystrom Method for Improved Large-Scale Machine Learning
We reconsider randomized algorithms for the low-rank approximation of
symmetric positive semi-definite (SPSD) matrices such as Laplacian and kernel
matrices that arise in data analysis and machine learning applications. Our
main results consist of an empirical evaluation of the performance quality and
running time of sampling and projection methods on a diverse suite of SPSD
matrices. Our results highlight complementary aspects of sampling versus
projection methods; they characterize the effects of common data preprocessing
steps on the performance of these algorithms; and they point to important
differences between uniform sampling and nonuniform sampling methods based on
leverage scores. In addition, our empirical results illustrate that existing
theory is so weak that it does not provide even a qualitative guide to
practice. Thus, we complement our empirical results with a suite of worst-case
theoretical bounds for both random sampling and random projection methods.
These bounds are qualitatively superior to existing bounds---e.g. improved
additive-error bounds for spectral and Frobenius norm error and relative-error
bounds for trace norm error---and they point to future directions to make these
algorithms useful in even larger-scale machine learning applications.Comment: 60 pages, 15 color figures; updated proof of Frobenius norm bounds,
added comparison to projection-based low-rank approximations, and an analysis
of the power method applied to SPSD sketche
Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions
Low-rank matrix approximations, such as the truncated singular value
decomposition and the rank-revealing QR decomposition, play a central role in
data analysis and scientific computing. This work surveys and extends recent
research which demonstrates that randomization offers a powerful tool for
performing low-rank matrix approximation. These techniques exploit modern
computational architectures more fully than classical methods and open the
possibility of dealing with truly massive data sets.
This paper presents a modular framework for constructing randomized
algorithms that compute partial matrix decompositions. These methods use random
sampling to identify a subspace that captures most of the action of a matrix.
The input matrix is then compressed---either explicitly or implicitly---to this
subspace, and the reduced matrix is manipulated deterministically to obtain the
desired low-rank factorization. In many cases, this approach beats its
classical competitors in terms of accuracy, speed, and robustness. These claims
are supported by extensive numerical experiments and a detailed error analysis
- …