106 research outputs found
Revisiting the Nystrom Method for Improved Large-Scale Machine Learning
We reconsider randomized algorithms for the low-rank approximation of
symmetric positive semi-definite (SPSD) matrices such as Laplacian and kernel
matrices that arise in data analysis and machine learning applications. Our
main results consist of an empirical evaluation of the performance quality and
running time of sampling and projection methods on a diverse suite of SPSD
matrices. Our results highlight complementary aspects of sampling versus
projection methods; they characterize the effects of common data preprocessing
steps on the performance of these algorithms; and they point to important
differences between uniform sampling and nonuniform sampling methods based on
leverage scores. In addition, our empirical results illustrate that existing
theory is so weak that it does not provide even a qualitative guide to
practice. Thus, we complement our empirical results with a suite of worst-case
theoretical bounds for both random sampling and random projection methods.
These bounds are qualitatively superior to existing bounds---e.g. improved
additive-error bounds for spectral and Frobenius norm error and relative-error
bounds for trace norm error---and they point to future directions to make these
algorithms useful in even larger-scale machine learning applications.Comment: 60 pages, 15 color figures; updated proof of Frobenius norm bounds,
added comparison to projection-based low-rank approximations, and an analysis
of the power method applied to SPSD sketche
Fixed-Rank Approximation of a Positive-Semidefinite Matrix from Streaming Data
Several important applications, such as streaming PCA and semidefinite
programming, involve a large-scale positive-semidefinite (psd) matrix that is
presented as a sequence of linear updates. Because of storage limitations, it
may only be possible to retain a sketch of the psd matrix. This paper develops
a new algorithm for fixed-rank psd approximation from a sketch. The approach
combines the Nystrom approximation with a novel mechanism for rank truncation.
Theoretical analysis establishes that the proposed method can achieve any
prescribed relative error in the Schatten 1-norm and that it exploits the
spectral decay of the input matrix. Computer experiments show that the proposed
method dominates alternative techniques for fixed-rank psd matrix approximation
across a wide range of examples
The Singular Value Decomposition, Applications and Beyond
The singular value decomposition (SVD) is not only a classical theory in
matrix computation and analysis, but also is a powerful tool in machine
learning and modern data analysis. In this tutorial we first study the basic
notion of SVD and then show the central role of SVD in matrices. Using
majorization theory, we consider variational principles of singular values and
eigenvalues. Built on SVD and a theory of symmetric gauge functions, we discuss
unitarily invariant norms, which are then used to formulate general results for
matrix low rank approximation. We study the subdifferentials of unitarily
invariant norms. These results would be potentially useful in many machine
learning problems such as matrix completion and matrix data classification.
Finally, we discuss matrix low rank approximation and its recent developments
such as randomized SVD, approximate matrix multiplication, CUR decomposition,
and Nystrom approximation. Randomized algorithms are important approaches to
large scale SVD as well as fast matrix computations
Reconstructing Kernel-based Machine Learning Force Fields with Super-linear Convergence
Kernel machines have sustained continuous progress in the field of quantum
chemistry. In particular, they have proven to be successful in the low-data
regime of force field reconstruction. This is because many physical invariances
and symmetries can be incorporated into the kernel function to compensate for
much larger datasets. So far, the scalability of this approach has however been
hindered by its cubical runtime in the number of training points. While it is
known, that iterative Krylov subspace solvers can overcome these burdens, they
crucially rely on effective preconditioners, which are elusive in practice.
Practical preconditioners need to be computationally efficient and numerically
robust at the same time. Here, we consider the broad class of Nystr\"om-type
methods to construct preconditioners based on successively more sophisticated
low-rank approximations of the original kernel matrix, each of which provides a
different set of computational trade-offs. All considered methods estimate the
relevant subspace spanned by the kernel matrix columns using different
strategies to identify a representative set of inducing points. Our
comprehensive study covers the full spectrum of approaches, starting from naive
random sampling to leverage score estimates and incomplete Cholesky
factorizations, up to exact SVD decompositions.Comment: 18 pages, 12 figures, preprin
Sampling-based Nystr\"om Approximation and Kernel Quadrature
We analyze the Nystr\"om approximation of a positive definite kernel
associated with a probability measure. We first prove an improved error bound
for the conventional Nystr\"om approximation with i.i.d. sampling and
singular-value decomposition in the continuous regime; the proof techniques are
borrowed from statistical learning theory. We further introduce a refined
selection of subspaces in Nystr\"om approximation with theoretical guarantees
that is applicable to non-i.i.d. landmark points. Finally, we discuss their
application to convex kernel quadrature and give novel theoretical guarantees
as well as numerical observations.Comment: 27 page
- …