4,647 research outputs found

    A geometric analysis of subspace clustering with outliers

    Full text link
    This paper considers the problem of clustering a collection of unlabeled data points assumed to lie near a union of lower-dimensional planes. As is common in computer vision or unsupervised learning applications, we do not know in advance how many subspaces there are nor do we have any information about their dimensions. We develop a novel geometric analysis of an algorithm named sparse subspace clustering (SSC) [In IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009 (2009) 2790-2797. IEEE], which significantly broadens the range of problems where it is provably effective. For instance, we show that SSC can recover multiple subspaces, each of dimension comparable to the ambient dimension. We also prove that SSC can correctly cluster data points even when the subspaces of interest intersect. Further, we develop an extension of SSC that succeeds when the data set is corrupted with possibly overwhelmingly many outliers. Underlying our analysis are clear geometric insights, which may bear on other sparse recovery problems. A numerical study complements our theoretical analysis and demonstrates the effectiveness of these methods.Comment: Published in at http://dx.doi.org/10.1214/12-AOS1034 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Provable Self-Representation Based Outlier Detection in a Union of Subspaces

    Full text link
    Many computer vision tasks involve processing large amounts of data contaminated by outliers, which need to be detected and rejected. While outlier detection methods based on robust statistics have existed for decades, only recently have methods based on sparse and low-rank representation been developed along with guarantees of correct outlier detection when the inliers lie in one or more low-dimensional subspaces. This paper proposes a new outlier detection method that combines tools from sparse representation with random walks on a graph. By exploiting the property that data points can be expressed as sparse linear combinations of each other, we obtain an asymmetric affinity matrix among data points, which we use to construct a weighted directed graph. By defining a suitable Markov Chain from this graph, we establish a connection between inliers/outliers and essential/inessential states of the Markov chain, which allows us to detect outliers by using random walks. We provide a theoretical analysis that justifies the correctness of our method under geometric and connectivity assumptions. Experimental results on image databases demonstrate its superiority with respect to state-of-the-art sparse and low-rank outlier detection methods.Comment: 16 pages. CVPR 2017 spotlight oral presentatio

    Randomized hybrid linear modeling by local best-fit flats

    Full text link
    The hybrid linear modeling problem is to identify a set of d-dimensional affine sets in a D-dimensional Euclidean space. It arises, for example, in object tracking and structure from motion. The hybrid linear model can be considered as the second simplest (behind linear) manifold model of data. In this paper we will present a very simple geometric method for hybrid linear modeling based on selecting a set of local best fit flats that minimize a global l1 error measure. The size of the local neighborhoods is determined automatically by the Jones' l2 beta numbers; it is proven under certain geometric conditions that good local neighborhoods exist and are found by our method. We also demonstrate how to use this algorithm for fast determination of the number of affine subspaces. We give extensive experimental evidence demonstrating the state of the art accuracy and speed of the algorithm on synthetic and real hybrid linear data.Comment: To appear in the proceedings of CVPR 201

    Noisy Subspace Clustering via Thresholding

    Full text link
    We consider the problem of clustering noisy high-dimensional data points into a union of low-dimensional subspaces and a set of outliers. The number of subspaces, their dimensions, and their orientations are unknown. A probabilistic performance analysis of the thresholding-based subspace clustering (TSC) algorithm introduced recently in [1] shows that TSC succeeds in the noisy case, even when the subspaces intersect. Our results reveal an explicit tradeoff between the allowed noise level and the affinity of the subspaces. We furthermore find that the simple outlier detection scheme introduced in [1] provably succeeds in the noisy case.Comment: Presented at the IEEE Int. Symp. Inf. Theory (ISIT) 2013, Istanbul, Turkey. The version posted here corrects a minor error in the published version. Specifically, the exponent -c n_l in the success probability of Theorem 1 and in the corresponding proof outline has been corrected to -c(n_l-1

    lp-Recovery of the Most Significant Subspace among Multiple Subspaces with Outliers

    Full text link
    We assume data sampled from a mixture of d-dimensional linear subspaces with spherically symmetric distributions within each subspace and an additional outlier component with spherically symmetric distribution within the ambient space (for simplicity we may assume that all distributions are uniform on their corresponding unit spheres). We also assume mixture weights for the different components. We say that one of the underlying subspaces of the model is most significant if its mixture weight is higher than the sum of the mixture weights of all other subspaces. We study the recovery of the most significant subspace by minimizing the lp-averaged distances of data points from d-dimensional subspaces, where p>0. Unlike other lp minimization problems, this minimization is non-convex for all p>0 and thus requires different methods for its analysis. We show that if 0<p<=1, then for any fraction of outliers the most significant subspace can be recovered by lp minimization with overwhelming probability (which depends on the generating distribution and its parameters). We show that when adding small noise around the underlying subspaces the most significant subspace can be nearly recovered by lp minimization for any 0<p<=1 with an error proportional to the noise level. On the other hand, if p>1 and there is more than one underlying subspace, then with overwhelming probability the most significant subspace cannot be recovered or nearly recovered. This last result does not require spherically symmetric outliers.Comment: This is a revised version of the part of 1002.1994 that deals with single subspace recovery. V3: Improved estimates (in particular for Lemma 3.1 and for estimates relying on it), asymptotic dependence of probabilities and constants on D and d and further clarifications; for simplicity it assumes uniform distributions on spheres. V4: minor revision for the published versio
    • …
    corecore