2,724 research outputs found
Algorithms and Hardness for Robust Subspace Recovery
We consider a fundamental problem in unsupervised learning called
\emph{subspace recovery}: given a collection of points in ,
if many but not necessarily all of these points are contained in a
-dimensional subspace can we find it? The points contained in are
called {\em inliers} and the remaining points are {\em outliers}. This problem
has received considerable attention in computer science and in statistics. Yet
efficient algorithms from computer science are not robust to {\em adversarial}
outliers, and the estimators from robust statistics are hard to compute in high
dimensions.
Are there algorithms for subspace recovery that are both robust to outliers
and efficient? We give an algorithm that finds when it contains more than a
fraction of the points. Hence, for say this estimator
is both easy to compute and well-behaved when there are a constant fraction of
outliers. We prove that it is Small Set Expansion hard to find when the
fraction of errors is any larger, thus giving evidence that our estimator is an
{\em optimal} compromise between efficiency and robustness.
As it turns out, this basic problem has a surprising number of connections to
other areas including small set expansion, matroid theory and functional
analysis that we make use of here.Comment: Appeared in Proceedings of COLT 201
Smoothed Analysis in Unsupervised Learning via Decoupling
Smoothed analysis is a powerful paradigm in overcoming worst-case
intractability in unsupervised learning and high-dimensional data analysis.
While polynomial time smoothed analysis guarantees have been obtained for
worst-case intractable problems like tensor decompositions and learning
mixtures of Gaussians, such guarantees have been hard to obtain for several
other important problems in unsupervised learning. A core technical challenge
in analyzing algorithms is obtaining lower bounds on the least singular value
for random matrix ensembles with dependent entries, that are given by
low-degree polynomials of a few base underlying random variables.
In this work, we address this challenge by obtaining high-confidence lower
bounds on the least singular value of new classes of structured random matrix
ensembles of the above kind. We then use these bounds to design algorithms with
polynomial time smoothed analysis guarantees for the following three important
problems in unsupervised learning:
1. Robust subspace recovery, when the fraction of inliers in the
d-dimensional subspace is at least for any constant integer . This contrasts with the known
worst-case intractability when , and the previous smoothed
analysis result which needed (Hardt and Moitra, 2013).
2. Learning overcomplete hidden markov models, where the size of the state
space is any polynomial in the dimension of the observations. This gives the
first polynomial time guarantees for learning overcomplete HMMs in a smoothed
analysis model.
3. Higher order tensor decompositions, where we generalize the so-called
FOOBI algorithm of Cardoso to find order- rank-one tensors in a subspace.
This allows us to obtain polynomially robust decomposition algorithms for
'th order tensors with rank .Comment: 44 page
Robust Subspace Learning: Robust PCA, Robust Subspace Tracking, and Robust Subspace Recovery
PCA is one of the most widely used dimension reduction techniques. A related
easier problem is "subspace learning" or "subspace estimation". Given
relatively clean data, both are easily solved via singular value decomposition
(SVD). The problem of subspace learning or PCA in the presence of outliers is
called robust subspace learning or robust PCA (RPCA). For long data sequences,
if one tries to use a single lower dimensional subspace to represent the data,
the required subspace dimension may end up being quite large. For such data, a
better model is to assume that it lies in a low-dimensional subspace that can
change over time, albeit gradually. The problem of tracking such data (and the
subspaces) while being robust to outliers is called robust subspace tracking
(RST). This article provides a magazine-style overview of the entire field of
robust subspace learning and tracking. In particular solutions for three
problems are discussed in detail: RPCA via sparse+low-rank matrix decomposition
(S+LR), RST via S+LR, and "robust subspace recovery (RSR)". RSR assumes that an
entire data vector is either an outlier or an inlier. The S+LR formulation
instead assumes that outliers occur on only a few data vector indices and hence
are well modeled as sparse corruptions.Comment: To appear, IEEE Signal Processing Magazine, July 201
- …