606 research outputs found
Sparse Subspace Clustering: Algorithm, Theory, and Applications
In many real-world problems, we are dealing with collections of
high-dimensional data, such as images, videos, text and web documents, DNA
microarray data, and more. Often, high-dimensional data lie close to
low-dimensional structures corresponding to several classes or categories the
data belongs to. In this paper, we propose and study an algorithm, called
Sparse Subspace Clustering (SSC), to cluster data points that lie in a union of
low-dimensional subspaces. The key idea is that, among infinitely many possible
representations of a data point in terms of other points, a sparse
representation corresponds to selecting a few points from the same subspace.
This motivates solving a sparse optimization program whose solution is used in
a spectral clustering framework to infer the clustering of data into subspaces.
Since solving the sparse optimization program is in general NP-hard, we
consider a convex relaxation and show that, under appropriate conditions on the
arrangement of subspaces and the distribution of data, the proposed
minimization program succeeds in recovering the desired sparse representations.
The proposed algorithm can be solved efficiently and can handle data points
near the intersections of subspaces. Another key advantage of the proposed
algorithm with respect to the state of the art is that it can deal with data
nuisances, such as noise, sparse outlying entries, and missing entries,
directly by incorporating the model of the data into the sparse optimization
program. We demonstrate the effectiveness of the proposed algorithm through
experiments on synthetic data as well as the two real-world problems of motion
segmentation and face clustering
Scalable Sparse Subspace Clustering by Orthogonal Matching Pursuit
Subspace clustering methods based on , or nuclear norm
regularization have become very popular due to their simplicity, theoretical
guarantees and empirical success. However, the choice of the regularizer can
greatly impact both theory and practice. For instance, regularization
is guaranteed to give a subspace-preserving affinity (i.e., there are no
connections between points from different subspaces) under broad conditions
(e.g., arbitrary subspaces and corrupted data). However, it requires solving a
large scale convex optimization problem. On the other hand, and
nuclear norm regularization provide efficient closed form solutions, but
require very strong assumptions to guarantee a subspace-preserving affinity,
e.g., independent subspaces and uncorrupted data. In this paper we study a
subspace clustering method based on orthogonal matching pursuit. We show that
the method is both computationally efficient and guaranteed to give a
subspace-preserving affinity under broad conditions. Experiments on synthetic
data verify our theoretical analysis, and applications in handwritten digit and
face clustering show that our approach achieves the best trade off between
accuracy and efficiency.Comment: 13 pages, 1 figure, 2 tables. Accepted to CVPR 2016 as an oral
presentatio
lp-Recovery of the Most Significant Subspace among Multiple Subspaces with Outliers
We assume data sampled from a mixture of d-dimensional linear subspaces with
spherically symmetric distributions within each subspace and an additional
outlier component with spherically symmetric distribution within the ambient
space (for simplicity we may assume that all distributions are uniform on their
corresponding unit spheres). We also assume mixture weights for the different
components. We say that one of the underlying subspaces of the model is most
significant if its mixture weight is higher than the sum of the mixture weights
of all other subspaces. We study the recovery of the most significant subspace
by minimizing the lp-averaged distances of data points from d-dimensional
subspaces, where p>0. Unlike other lp minimization problems, this minimization
is non-convex for all p>0 and thus requires different methods for its analysis.
We show that if 0<p<=1, then for any fraction of outliers the most significant
subspace can be recovered by lp minimization with overwhelming probability
(which depends on the generating distribution and its parameters). We show that
when adding small noise around the underlying subspaces the most significant
subspace can be nearly recovered by lp minimization for any 0<p<=1 with an
error proportional to the noise level. On the other hand, if p>1 and there is
more than one underlying subspace, then with overwhelming probability the most
significant subspace cannot be recovered or nearly recovered. This last result
does not require spherically symmetric outliers.Comment: This is a revised version of the part of 1002.1994 that deals with
single subspace recovery. V3: Improved estimates (in particular for Lemma 3.1
and for estimates relying on it), asymptotic dependence of probabilities and
constants on D and d and further clarifications; for simplicity it assumes
uniform distributions on spheres. V4: minor revision for the published
versio
- …