11 research outputs found
Randomized hybrid linear modeling by local best-fit flats
The hybrid linear modeling problem is to identify a set of d-dimensional
affine sets in a D-dimensional Euclidean space. It arises, for example, in
object tracking and structure from motion. The hybrid linear model can be
considered as the second simplest (behind linear) manifold model of data. In
this paper we will present a very simple geometric method for hybrid linear
modeling based on selecting a set of local best fit flats that minimize a
global l1 error measure. The size of the local neighborhoods is determined
automatically by the Jones' l2 beta numbers; it is proven under certain
geometric conditions that good local neighborhoods exist and are found by our
method. We also demonstrate how to use this algorithm for fast determination of
the number of affine subspaces. We give extensive experimental evidence
demonstrating the state of the art accuracy and speed of the algorithm on
synthetic and real hybrid linear data.Comment: To appear in the proceedings of CVPR 201
CUR Decompositions, Similarity Matrices, and Subspace Clustering
A general framework for solving the subspace clustering problem using the CUR
decomposition is presented. The CUR decomposition provides a natural way to
construct similarity matrices for data that come from a union of unknown
subspaces . The similarity
matrices thus constructed give the exact clustering in the noise-free case.
Additionally, this decomposition gives rise to many distinct similarity
matrices from a given set of data, which allow enough flexibility to perform
accurate clustering of noisy data. We also show that two known methods for
subspace clustering can be derived from the CUR decomposition. An algorithm
based on the theoretical construction of similarity matrices is presented, and
experiments on synthetic and real data are presented to test the method.
Additionally, an adaptation of our CUR based similarity matrices is utilized
to provide a heuristic algorithm for subspace clustering; this algorithm yields
the best overall performance to date for clustering the Hopkins155 motion
segmentation dataset.Comment: Approximately 30 pages. Current version contains improved algorithm
and numerical experiments from the previous versio
Reduced row echelon form and non-linear approximation for subspace segmentation and high-dimensional data clustering
Given a set of data W={w1,…,wN}∈RD drawn from a union of subspaces, we focus on determining a nonlinear model of the form U=⋃i∈ISi, where {Si⊂RD}i∈I is a set of subspaces, that is nearest to W. The model is then used to classify W into clusters. Our approach is based on the binary reduced row echelon form of data matrix, combined with an iterative scheme based on a non-linear approximation method. We prove that, in absence of noise, our approach can find the number of subspaces, their dimensions, and an orthonormal basis for each subspace Si. We provide a comprehensive analysis of our theory and determine its limitations and strengths in presence of outliers and noise
Non-Asymptotic Analysis of Tangent Space Perturbation
Constructing an efficient parameterization of a large, noisy data set of
points lying close to a smooth manifold in high dimension remains a fundamental
problem. One approach consists in recovering a local parameterization using the
local tangent plane. Principal component analysis (PCA) is often the tool of
choice, as it returns an optimal basis in the case of noise-free samples from a
linear subspace. To process noisy data samples from a nonlinear manifold, PCA
must be applied locally, at a scale small enough such that the manifold is
approximately linear, but at a scale large enough such that structure may be
discerned from noise. Using eigenspace perturbation theory and non-asymptotic
random matrix theory, we study the stability of the subspace estimated by PCA
as a function of scale, and bound (with high probability) the angle it forms
with the true tangent space. By adaptively selecting the scale that minimizes
this bound, our analysis reveals an appropriate scale for local tangent plane
recovery. We also introduce a geometric uncertainty principle quantifying the
limits of noise-curvature perturbation for stable recovery. With the purpose of
providing perturbation bounds that can be used in practice, we propose plug-in
estimates that make it possible to directly apply the theoretical results to
real data sets.Comment: 53 pages. Revised manuscript with new content addressing application
of results to real data set
Riemannian Multi-Manifold Modeling
This paper advocates a novel framework for segmenting a dataset in a
Riemannian manifold into clusters lying around low-dimensional submanifolds
of . Important examples of , for which the proposed clustering algorithm
is computationally efficient, are the sphere, the set of positive definite
matrices, and the Grassmannian. The clustering problem with these examples of
is already useful for numerous application domains such as action
identification in video sequences, dynamic texture clustering, brain fiber
segmentation in medical imaging, and clustering of deformed images. The
proposed clustering algorithm constructs a data-affinity matrix by thoroughly
exploiting the intrinsic geometry and then applies spectral clustering. The
intrinsic local geometry is encoded by local sparse coding and more importantly
by directional information of local tangent spaces and geodesics. Theoretical
guarantees are established for a simplified variant of the algorithm even when
the clusters intersect. To avoid complication, these guarantees assume that the
underlying submanifolds are geodesic. Extensive validation on synthetic and
real data demonstrates the resiliency of the proposed method against deviations
from the theoretical model as well as its superior performance over
state-of-the-art techniques
Endogenous Sparse Recovery
Sparsity has proven to be an essential ingredient in the development of efficient solutions to a number of problems in signal processing and machine learning. In all of these settings, sparse recovery methods are employed to recover signals that admit sparse representations in a pre-specified basis. Recently, sparse recovery methods have been employed in an entirely new way; instead of finding a sparse representation of a signal in a fixed basis, a sparse representation is formed "from within" the data. In this thesis, we study the utility of this endogenous sparse recovery procedure for learning unions of subspaces from collections of high-dimensional data. We provide new insights into the behavior of endogenous sparse recovery, develop sufficient conditions that describe when greedy methods will reveal local estimates of the subspaces in the ensemble, and introduce new methods to learn unions of overlapping subspaces from local subspace estimates