44,639 research outputs found
Convergence of Gradient Descent for Low-Rank Matrix Approximation
This paper provides a proof of global convergence of gradient search for low-rank matrix approximation. Such approximations have recently been of interest for large-scale problems, as well as for dictionary learning for sparse signal representations and matrix completion. The proof is based on the interpretation of the problem as an optimization on the Grassmann manifold and Fubiny-Study distance on this space
Matrix Coherence and the Nystrom Method
The Nystrom method is an efficient technique to speed up large-scale learning
applications by generating low-rank approximations. Crucial to the performance
of this technique is the assumption that a matrix can be well approximated by
working exclusively with a subset of its columns. In this work we relate this
assumption to the concept of matrix coherence and connect matrix coherence to
the performance of the Nystrom method. Making use of related work in the
compressed sensing and the matrix completion literature, we derive novel
coherence-based bounds for the Nystrom method in the low-rank setting. We then
present empirical results that corroborate these theoretical bounds. Finally,
we present more general empirical results for the full-rank setting that
convincingly demonstrate the ability of matrix coherence to measure the degree
to which information can be extracted from a subset of columns
Sharp analysis of low-rank kernel matrix approximations
We consider supervised learning problems within the positive-definite kernel
framework, such as kernel ridge regression, kernel logistic regression or the
support vector machine. With kernels leading to infinite-dimensional feature
spaces, a common practical limiting difficulty is the necessity of computing
the kernel matrix, which most frequently leads to algorithms with running time
at least quadratic in the number of observations n, i.e., O(n^2). Low-rank
approximations of the kernel matrix are often considered as they allow the
reduction of running time complexities to O(p^2 n), where p is the rank of the
approximation. The practicality of such methods thus depends on the required
rank p. In this paper, we show that in the context of kernel ridge regression,
for approximations based on a random subset of columns of the original kernel
matrix, the rank p may be chosen to be linear in the degrees of freedom
associated with the problem, a quantity which is classically used in the
statistical analysis of such methods, and is often seen as the implicit number
of parameters of non-parametric estimators. This result enables simple
algorithms that have sub-quadratic running time complexity, but provably
exhibit the same predictive performance than existing algorithms, for any given
problem instance, and not only for worst-case situations
Revisiting the Nystrom Method for Improved Large-Scale Machine Learning
We reconsider randomized algorithms for the low-rank approximation of
symmetric positive semi-definite (SPSD) matrices such as Laplacian and kernel
matrices that arise in data analysis and machine learning applications. Our
main results consist of an empirical evaluation of the performance quality and
running time of sampling and projection methods on a diverse suite of SPSD
matrices. Our results highlight complementary aspects of sampling versus
projection methods; they characterize the effects of common data preprocessing
steps on the performance of these algorithms; and they point to important
differences between uniform sampling and nonuniform sampling methods based on
leverage scores. In addition, our empirical results illustrate that existing
theory is so weak that it does not provide even a qualitative guide to
practice. Thus, we complement our empirical results with a suite of worst-case
theoretical bounds for both random sampling and random projection methods.
These bounds are qualitatively superior to existing bounds---e.g. improved
additive-error bounds for spectral and Frobenius norm error and relative-error
bounds for trace norm error---and they point to future directions to make these
algorithms useful in even larger-scale machine learning applications.Comment: 60 pages, 15 color figures; updated proof of Frobenius norm bounds,
added comparison to projection-based low-rank approximations, and an analysis
of the power method applied to SPSD sketche
- …