Search CORE

1,257 research outputs found

Algorithms and Hardness for Robust Subspace Recovery

Author: Hardt Moritz
Moitra Ankur
Publication venue
Publication date: 03/12/2013
Field of study

We consider a fundamental problem in unsupervised learning called \emph{subspace recovery}: given a collection of

m

points in

\mathbb{R}^n

, if many but not necessarily all of these points are contained in a

d

-dimensional subspace

T

can we find it? The points contained in

T

are called {\em inliers} and the remaining points are {\em outliers}. This problem has received considerable attention in computer science and in statistics. Yet efficient algorithms from computer science are not robust to {\em adversarial} outliers, and the estimators from robust statistics are hard to compute in high dimensions. Are there algorithms for subspace recovery that are both robust to outliers and efficient? We give an algorithm that finds

T

when it contains more than a

\frac{d}{n}

fraction of the points. Hence, for say

d = n/2

this estimator is both easy to compute and well-behaved when there are a constant fraction of outliers. We prove that it is Small Set Expansion hard to find

T

when the fraction of errors is any larger, thus giving evidence that our estimator is an {\em optimal} compromise between efficiency and robustness. As it turns out, this basic problem has a surprising number of connections to other areas including small set expansion, matroid theory and functional analysis that we make use of here.Comment: Appeared in Proceedings of COLT 201

arXiv.org e-Print Archive

CiteSeerX

Smoothed Analysis in Unsupervised Learning via Decoupling

Author: Bhaskara Aditya
Chen Aidao
Perreault Aidan
Vijayaraghavan Aravindan
Publication venue
Publication date: 23/04/2019
Field of study

Smoothed analysis is a powerful paradigm in overcoming worst-case intractability in unsupervised learning and high-dimensional data analysis. While polynomial time smoothed analysis guarantees have been obtained for worst-case intractable problems like tensor decompositions and learning mixtures of Gaussians, such guarantees have been hard to obtain for several other important problems in unsupervised learning. A core technical challenge in analyzing algorithms is obtaining lower bounds on the least singular value for random matrix ensembles with dependent entries, that are given by low-degree polynomials of a few base underlying random variables. In this work, we address this challenge by obtaining high-confidence lower bounds on the least singular value of new classes of structured random matrix ensembles of the above kind. We then use these bounds to design algorithms with polynomial time smoothed analysis guarantees for the following three important problems in unsupervised learning: 1. Robust subspace recovery, when the fraction

\alpha

of inliers in the d-dimensional subspace

T \subset \mathbb{R}^n

is at least

\alpha > (d/n)^\ell

for any constant integer

\ell>0

. This contrasts with the known worst-case intractability when

\alpha< d/n

, and the previous smoothed analysis result which needed

\alpha > d/n

(Hardt and Moitra, 2013). 2. Learning overcomplete hidden markov models, where the size of the state space is any polynomial in the dimension of the observations. This gives the first polynomial time guarantees for learning overcomplete HMMs in a smoothed analysis model. 3. Higher order tensor decompositions, where we generalize the so-called FOOBI algorithm of Cardoso to find order-

\ell

rank-one tensors in a subspace. This allows us to obtain polynomially robust decomposition algorithms for

2\ell

'th order tensors with rank

O(n^{\ell})

.Comment: 44 page

arXiv.org e-Print Archive

Crossref