7,117 research outputs found
Fast approximation of matrix coherence and statistical leverage
The statistical leverage scores of a matrix are the squared row-norms of
the matrix containing its (top) left singular vectors and the coherence is the
largest leverage score. These quantities are of interest in recently-popular
problems such as matrix completion and Nystr\"{o}m-based low-rank matrix
approximation as well as in large-scale statistical data analysis applications
more generally; moreover, they are of interest since they define the key
structural nonuniformity that must be dealt with in developing fast randomized
matrix algorithms. Our main result is a randomized algorithm that takes as
input an arbitrary matrix , with , and that returns as
output relative-error approximations to all of the statistical leverage
scores. The proposed algorithm runs (under assumptions on the precise values of
and ) in time, as opposed to the time required
by the na\"{i}ve algorithm that involves computing an orthogonal basis for the
range of . Our analysis may be viewed in terms of computing a relative-error
approximation to an underconstrained least-squares approximation problem, or,
relatedly, it may be viewed as an application of Johnson-Lindenstrauss type
ideas. Several practically-important extensions of our basic result are also
described, including the approximation of so-called cross-leverage scores, the
extension of these ideas to matrices with , and the extension to
streaming environments.Comment: 29 pages; conference version is in ICML; journal version is in JML
Uniform Sampling for Matrix Approximation
Random sampling has become a critical tool in solving massive matrix
problems. For linear regression, a small, manageable set of data rows can be
randomly selected to approximate a tall, skinny data matrix, improving
processing time significantly. For theoretical performance guarantees, each row
must be sampled with probability proportional to its statistical leverage
score. Unfortunately, leverage scores are difficult to compute.
A simple alternative is to sample rows uniformly at random. While this often
works, uniform sampling will eliminate critical row information for many
natural instances. We take a fresh look at uniform sampling by examining what
information it does preserve. Specifically, we show that uniform sampling
yields a matrix that, in some sense, well approximates a large fraction of the
original. While this weak form of approximation is not enough for solving
linear regression directly, it is enough to compute a better approximation.
This observation leads to simple iterative row sampling algorithms for matrix
approximation that run in input-sparsity time and preserve row structure and
sparsity at all intermediate steps. In addition to an improved understanding of
uniform sampling, our main proof introduces a structural result of independent
interest: we show that every matrix can be made to have low coherence by
reweighting a small subset of its rows
Efficient Algorithms and Error Analysis for the Modified Nystrom Method
Many kernel methods suffer from high time and space complexities and are thus
prohibitive in big-data applications. To tackle the computational challenge,
the Nystr\"om method has been extensively used to reduce time and space
complexities by sacrificing some accuracy. The Nystr\"om method speedups
computation by constructing an approximation of the kernel matrix using only a
few columns of the matrix. Recently, a variant of the Nystr\"om method called
the modified Nystr\"om method has demonstrated significant improvement over the
standard Nystr\"om method in approximation accuracy, both theoretically and
empirically.
In this paper, we propose two algorithms that make the modified Nystr\"om
method practical. First, we devise a simple column selection algorithm with a
provable error bound. Our algorithm is more efficient and easier to implement
than and nearly as accurate as the state-of-the-art algorithm. Second, with the
selected columns at hand, we propose an algorithm that computes the
approximation in lower time complexity than the approach in the previous work.
Furthermore, we prove that the modified Nystr\"om method is exact under certain
conditions, and we establish a lower error bound for the modified Nystr\"om
method.Comment: 9-page paper plus appendix. In Proceedings of the 17th International
Conference on Artificial Intelligence and Statistics (AISTATS) 2014,
Reykjavik, Iceland. JMLR: W&CP volume 3
Revisiting the Nystrom Method for Improved Large-Scale Machine Learning
We reconsider randomized algorithms for the low-rank approximation of
symmetric positive semi-definite (SPSD) matrices such as Laplacian and kernel
matrices that arise in data analysis and machine learning applications. Our
main results consist of an empirical evaluation of the performance quality and
running time of sampling and projection methods on a diverse suite of SPSD
matrices. Our results highlight complementary aspects of sampling versus
projection methods; they characterize the effects of common data preprocessing
steps on the performance of these algorithms; and they point to important
differences between uniform sampling and nonuniform sampling methods based on
leverage scores. In addition, our empirical results illustrate that existing
theory is so weak that it does not provide even a qualitative guide to
practice. Thus, we complement our empirical results with a suite of worst-case
theoretical bounds for both random sampling and random projection methods.
These bounds are qualitatively superior to existing bounds---e.g. improved
additive-error bounds for spectral and Frobenius norm error and relative-error
bounds for trace norm error---and they point to future directions to make these
algorithms useful in even larger-scale machine learning applications.Comment: 60 pages, 15 color figures; updated proof of Frobenius norm bounds,
added comparison to projection-based low-rank approximations, and an analysis
of the power method applied to SPSD sketche
Far-Field Compression for Fast Kernel Summation Methods in High Dimensions
We consider fast kernel summations in high dimensions: given a large set of
points in dimensions (with ) and a pair-potential function (the
{\em kernel} function), we compute a weighted sum of all pairwise kernel
interactions for each point in the set. Direct summation is equivalent to a
(dense) matrix-vector multiplication and scales quadratically with the number
of points. Fast kernel summation algorithms reduce this cost to log-linear or
linear complexity.
Treecodes and Fast Multipole Methods (FMMs) deliver tremendous speedups by
constructing approximate representations of interactions of points that are far
from each other. In algebraic terms, these representations correspond to
low-rank approximations of blocks of the overall interaction matrix. Existing
approaches require an excessive number of kernel evaluations with increasing
and number of points in the dataset.
To address this issue, we use a randomized algebraic approach in which we
first sample the rows of a block and then construct its approximate, low-rank
interpolative decomposition. We examine the feasibility of this approach
theoretically and experimentally. We provide a new theoretical result showing a
tighter bound on the reconstruction error from uniformly sampling rows than the
existing state-of-the-art. We demonstrate that our sampling approach is
competitive with existing (but prohibitively expensive) methods from the
literature. We also construct kernel matrices for the Laplacian, Gaussian, and
polynomial kernels -- all commonly used in physics and data analysis. We
explore the numerical properties of blocks of these matrices, and show that
they are amenable to our approach. Depending on the data set, our randomized
algorithm can successfully compute low rank approximations in high dimensions.
We report results for data sets with ambient dimensions from four to 1,000.Comment: 43 pages, 21 figure
- …