1,320 research outputs found
Improved Practical Matrix Sketching with Guarantees
Matrices have become essential data representations for many large-scale
problems in data analytics, and hence matrix sketching is a critical task.
Although much research has focused on improving the error/size tradeoff under
various sketching paradigms, the many forms of error bounds make these
approaches hard to compare in theory and in practice. This paper attempts to
categorize and compare most known methods under row-wise streaming updates with
provable guarantees, and then to tweak some of these methods to gain practical
improvements while retaining guarantees.
For instance, we observe that a simple heuristic iSVD, with no guarantees,
tends to outperform all known approaches in terms of size/error trade-off. We
modify the best performing method with guarantees FrequentDirections under the
size/error trade-off to match the performance of iSVD and retain its
guarantees. We also demonstrate some adversarial datasets where iSVD performs
quite poorly. In comparing techniques in the time/error trade-off, techniques
based on hashing or sampling tend to perform better. In this setting we modify
the most studied sampling regime to retain error guarantee but obtain dramatic
improvements in the time/error trade-off.
Finally, we provide easy replication of our studies on APT, a new testbed
which makes available not only code and datasets, but also a computing platform
with fixed environmental settings.Comment: 27 page
Isometric sketching of any set via the Restricted Isometry Property
In this paper we show that for the purposes of dimensionality reduction
certain class of structured random matrices behave similarly to random Gaussian
matrices. This class includes several matrices for which matrix-vector multiply
can be computed in log-linear time, providing efficient dimensionality
reduction of general sets. In particular, we show that using such matrices any
set from high dimensions can be embedded into lower dimensions with near
optimal distortion. We obtain our results by connecting dimensionality
reduction of any set to dimensionality reduction of sparse vectors via a
chaining argument.Comment: 17 page
Practical sketching algorithms for low-rank matrix approximation
This paper describes a suite of algorithms for constructing low-rank
approximations of an input matrix from a random linear image of the matrix,
called a sketch. These methods can preserve structural properties of the input
matrix, such as positive-semidefiniteness, and they can produce approximations
with a user-specified rank. The algorithms are simple, accurate, numerically
stable, and provably correct. Moreover, each method is accompanied by an
informative error bound that allows users to select parameters a priori to
achieve a given approximation quality. These claims are supported by numerical
experiments with real and synthetic data
Toward a unified theory of sparse dimensionality reduction in Euclidean space
Let be a sparse Johnson-Lindenstrauss
transform [KN14] with non-zeroes per column. For a subset of the unit
sphere, given, we study settings for required to
ensure i.e. so that preserves the norm of every
simultaneously and multiplicatively up to . We
introduce a new complexity parameter, which depends on the geometry of , and
show that it suffices to choose and such that this parameter is small.
Our result is a sparse analog of Gordon's theorem, which was concerned with a
dense having i.i.d. Gaussian entries. We qualitatively unify several
results related to the Johnson-Lindenstrauss lemma, subspace embeddings, and
Fourier-based restricted isometries. Our work also implies new results in using
the sparse Johnson-Lindenstrauss transform in numerical linear algebra,
classical and model-based compressed sensing, manifold learning, and
constrained least squares problems such as the Lasso
Simple and Deterministic Matrix Sketching
We adapt a well known streaming algorithm for approximating item frequencies
to the matrix sketching setting. The algorithm receives the rows of a large
matrix one after the other in a streaming fashion. It
maintains a sketch matrix B \in \R^ {1/\eps \times m} such that for any unit
vector [\|Ax\|^2 \ge \|Bx\|^2 \ge \|Ax\|^2 - \eps \|A\|_{f}^2 \.] Sketch
updates per row in require O(m/\eps^2) operations in the worst case. A
slight modification of the algorithm allows for an amortized update time of
O(m/\eps) operations per row. The presented algorithm stands out in that it
is: deterministic, simple to implement, and elementary to prove. It also
experimentally produces more accurate sketches than widely used approaches
while still being computationally competitive
- …