6,716 research outputs found
Dimensionality reduction with subgaussian matrices: a unified theory
We present a theory for Euclidean dimensionality reduction with subgaussian
matrices which unifies several restricted isometry property and
Johnson-Lindenstrauss type results obtained earlier for specific data sets. In
particular, we recover and, in several cases, improve results for sets of
sparse and structured sparse vectors, low-rank matrices and tensors, and smooth
manifolds. In addition, we establish a new Johnson-Lindenstrauss embedding for
data sets taking the form of an infinite union of subspaces of a Hilbert space
Approximation and Streaming Algorithms for Projective Clustering via Random Projections
Let be a set of points in . In the projective
clustering problem, given and norm , we have to
compute a set of -dimensional flats such that is minimized; here
represents the (Euclidean) distance of to the closest flat in
. We let denote the minimal value and interpret
to be . When and
and , the problem corresponds to the -median, -mean and the
-center clustering problems respectively.
For every , and , we show that the
orthogonal projection of onto a randomly chosen flat of dimension
will -approximate
. This result combines the concepts of geometric coresets and
subspace embeddings based on the Johnson-Lindenstrauss Lemma. As a consequence,
an orthogonal projection of to an dimensional randomly chosen subspace
-approximates projective clusterings for every and
simultaneously. Note that the dimension of this subspace is independent of the
number of clusters~.
Using this dimension reduction result, we obtain new approximation and
streaming algorithms for projective clustering problems. For example, given a
stream of points, we show how to compute an -approximate
projective clustering for every and simultaneously using only
space. Compared to
standard streaming algorithms with space requirement, our approach
is a significant improvement when the number of input points and their
dimensions are of the same order of magnitude.Comment: Canadian Conference on Computational Geometry (CCCG 2015
Subspace clustering of dimensionality-reduced data
Subspace clustering refers to the problem of clustering unlabeled
high-dimensional data points into a union of low-dimensional linear subspaces,
assumed unknown. In practice one may have access to dimensionality-reduced
observations of the data only, resulting, e.g., from "undersampling" due to
complexity and speed constraints on the acquisition device. More pertinently,
even if one has access to the high-dimensional data set it is often desirable
to first project the data points into a lower-dimensional space and to perform
the clustering task there; this reduces storage requirements and computational
cost. The purpose of this paper is to quantify the impact of
dimensionality-reduction through random projection on the performance of the
sparse subspace clustering (SSC) and the thresholding based subspace clustering
(TSC) algorithms. We find that for both algorithms dimensionality reduction
down to the order of the subspace dimensions is possible without incurring
significant performance degradation. The mathematical engine behind our
theorems is a result quantifying how the affinities between subspaces change
under random dimensionality reducing projections.Comment: ISIT 201
- …