734 research outputs found
The Dual JL Transforms and Superfast Matrix Algorithms
We call a matrix algorithm superfast (aka running at sublinear cost) if it
involves much fewer flops and memory cells than the matrix has entries. Using
such algorithms is highly desired or even imperative in computations for Big
Data, which involve immense matrices and are quite typically reduced to solving
linear least squares problem and/or computation of low rank approximation of an
input matrix. The known algorithms for these problems are not superfast, but we
prove that their certain superfast modifications output reasonable or even
nearly optimal solutions for large input classes. We also propose, analyze, and
test a novel superfast algorithm for iterative refinement of any crude but
sufficiently close low rank approximation of a matrix. The results of our
numerical tests are in good accordance with our formal study.Comment: 36.1 pages, 5 figures, and 1 table. arXiv admin note: text overlap
with arXiv:1710.07946, arXiv:1906.0411
Dimensionality Reduction for k-Means Clustering and Low Rank Approximation
We show how to approximate a data matrix with a much smaller
sketch that can be used to solve a general class of
constrained k-rank approximation problems to within error.
Importantly, this class of problems includes -means clustering and
unconstrained low rank approximation (i.e. principal component analysis). By
reducing data points to just dimensions, our methods generically
accelerate any exact, approximate, or heuristic algorithm for these ubiquitous
problems.
For -means dimensionality reduction, we provide relative
error results for many common sketching techniques, including random row
projection, column selection, and approximate SVD. For approximate principal
component analysis, we give a simple alternative to known algorithms that has
applications in the streaming setting. Additionally, we extend recent work on
column-based matrix reconstruction, giving column subsets that not only `cover'
a good subspace for \bv{A}, but can be used directly to compute this
subspace.
Finally, for -means clustering, we show how to achieve a
approximation by Johnson-Lindenstrauss projecting data points to just dimensions. This gives the first result that leverages the
specific structure of -means to achieve dimension independent of input size
and sublinear in
- …