88 research outputs found
New bounds for circulant Johnson-Lindenstrauss embeddings
This paper analyzes circulant Johnson-Lindenstrauss (JL) embeddings which, as
an important class of structured random JL embeddings, are formed by
randomizing the column signs of a circulant matrix generated by a random
vector. With the help of recent decoupling techniques and matrix-valued
Bernstein inequalities, we obtain a new bound
for Gaussian circulant JL embeddings.
Moreover, by using the Laplace transform technique (also called Bernstein's
trick), we extend the result to subgaussian case. The bounds in this paper
offer a small improvement over the current best bounds for Gaussian circulant
JL embeddings for certain parameter regimes and are derived using more direct
methods.Comment: 11 pages; accepted by Communications in Mathematical Science
Improved analysis of the subsampled randomized Hadamard transform
This paper presents an improved analysis of a structured dimension-reduction
map called the subsampled randomized Hadamard transform. This argument
demonstrates that the map preserves the Euclidean geometry of an entire
subspace of vectors. The new proof is much simpler than previous approaches,
and it offers---for the first time---optimal constants in the estimate on the
number of dimensions required for the embedding.Comment: 8 pages. To appear, Advances in Adaptive Data Analysis, special issue
"Sparse Representation of Data and Images." v2--v4 include minor correction
Sketching via hashing: from heavy hitters to compressed sensing to sparse fourier transform
Sketching via hashing is a popular and useful method for processing large data sets. Its basic idea is as follows. Suppose that we have a large multi-set of elements S=[formula], and we would like to identify the elements that occur “frequently" in S. The algorithm starts by selecting a hash function h that maps the elements into an array c[1…m]. The array entries are initialized to 0. Then, for each element a ∈ S, the algorithm increments c[h(a)]. At the end of the process, each array entry c[j] contains the count of all data elements a ∈ S mapped to j
Stable Recovery Of Sparse Vectors From Random Sinusoidal Feature Maps
Random sinusoidal features are a popular approach for speeding up
kernel-based inference in large datasets. Prior to the inference stage, the
approach suggests performing dimensionality reduction by first multiplying each
data vector by a random Gaussian matrix, and then computing an element-wise
sinusoid. Theoretical analysis shows that collecting a sufficient number of
such features can be reliably used for subsequent inference in kernel
classification and regression.
In this work, we demonstrate that with a mild increase in the dimension of
the embedding, it is also possible to reconstruct the data vector from such
random sinusoidal features, provided that the underlying data is sparse enough.
In particular, we propose a numerically stable algorithm for reconstructing the
data vector given the nonlinear features, and analyze its sample complexity.
Our algorithm can be extended to other types of structured inverse problems,
such as demixing a pair of sparse (but incoherent) vectors. We support the
efficacy of our approach via numerical experiments
Subspace clustering of dimensionality-reduced data
Subspace clustering refers to the problem of clustering unlabeled
high-dimensional data points into a union of low-dimensional linear subspaces,
assumed unknown. In practice one may have access to dimensionality-reduced
observations of the data only, resulting, e.g., from "undersampling" due to
complexity and speed constraints on the acquisition device. More pertinently,
even if one has access to the high-dimensional data set it is often desirable
to first project the data points into a lower-dimensional space and to perform
the clustering task there; this reduces storage requirements and computational
cost. The purpose of this paper is to quantify the impact of
dimensionality-reduction through random projection on the performance of the
sparse subspace clustering (SSC) and the thresholding based subspace clustering
(TSC) algorithms. We find that for both algorithms dimensionality reduction
down to the order of the subspace dimensions is possible without incurring
significant performance degradation. The mathematical engine behind our
theorems is a result quantifying how the affinities between subspaces change
under random dimensionality reducing projections.Comment: ISIT 201
Isometric sketching of any set via the Restricted Isometry Property
In this paper we show that for the purposes of dimensionality reduction
certain class of structured random matrices behave similarly to random Gaussian
matrices. This class includes several matrices for which matrix-vector multiply
can be computed in log-linear time, providing efficient dimensionality
reduction of general sets. In particular, we show that using such matrices any
set from high dimensions can be embedded into lower dimensions with near
optimal distortion. We obtain our results by connecting dimensionality
reduction of any set to dimensionality reduction of sparse vectors via a
chaining argument.Comment: 17 page
- …