5 research outputs found
Scalable and Robust Sparse Subspace Clustering Using Randomized Clustering and Multilayer Graphs
Sparse subspace clustering (SSC) is one of the current state-of-the-art
methods for partitioning data points into the union of subspaces, with strong
theoretical guarantees. However, it is not practical for large data sets as it
requires solving a LASSO problem for each data point, where the number of
variables in each LASSO problem is the number of data points. To improve the
scalability of SSC, we propose to select a few sets of anchor points using a
randomized hierarchical clustering method, and, for each set of anchor points,
solve the LASSO problems for each data point allowing only anchor points to
have a non-zero weight (this reduces drastically the number of variables). This
generates a multilayer graph where each layer corresponds to a different set of
anchor points. Using the Grassmann manifold of orthogonal matrices, the shared
connectivity among the layers is summarized within a single subspace. Finally,
we use -means clustering within that subspace to cluster the data points,
similarly as done by spectral clustering in SSC. We show on both synthetic and
real-world data sets that the proposed method not only allows SSC to scale to
large-scale data sets, but that it is also much more robust as it performs
significantly better on noisy data and on data with close susbspaces and
outliers, while it is not prone to oversegmentation.Comment: 25 pages, v2: typos correcte
Efficient Solvers for Sparse Subspace Clustering
Sparse subspace clustering (SSC) clusters points that lie near a union of
low-dimensional subspaces. The SSC model expresses each point as a linear or
affine combination of the other points, using either or
regularization. Using regularization results in a convex problem but
requires storage, and is typically solved by the alternating direction
method of multipliers which takes flops. The model is
non-convex but only needs memory linear in , and is solved via orthogonal
matching pursuit and cannot handle the case of affine subspaces. This paper
shows that a proximal gradient framework can solve SSC, covering both
and models, and both linear and affine constraints. For both
and , algorithms to compute the proximity operator in the presence of
affine constraints have not been presented in the SSC literature, so we derive
an exact and efficient algorithm that solves the case with just
flops. In the case, our algorithm retains the low-memory
overhead, and is the first algorithm to solve the SSC- model with
affine constraints. Experiments show our algorithms do not rely on sensitive
regularization parameters, and they are less sensitive to sparsity
misspecification and high noise.Comment: This paper is accepted for publication in Signal Processin
Exactly Robust Kernel Principal Component Analysis
Robust principal component analysis (RPCA) can recover low-rank matrices when
they are corrupted by sparse noises. In practice, many matrices are, however,
of high-rank and hence cannot be recovered by RPCA. We propose a novel method
called robust kernel principal component analysis (RKPCA) to decompose a
partially corrupted matrix as a sparse matrix plus a high or full-rank matrix
with low latent dimensionality. RKPCA can be applied to many problems such as
noise removal and subspace clustering and is still the only unsupervised
nonlinear method robust to sparse noises. Our theoretical analysis shows that,
with high probability, RKPCA can provide high recovery accuracy. The
optimization of RKPCA involves nonconvex and indifferentiable problems. We
propose two nonconvex optimization algorithms for RKPCA. They are alternating
direction method of multipliers with backtracking line search and proximal
linearized minimization with adaptive step size. Comparative studies in noise
removal and robust subspace clustering corroborate the effectiveness and
superiority of RKPCA.Comment: The paper was accepted by IEEE Transactions on Neural Networks and
Learning System
On a minimum enclosing ball of a collection of linear subspaces
This paper concerns the minimax center of a collection of linear subspaces.
When the subspaces are -dimensional subspaces of , this can be
cast as finding the center of a minimum enclosing ball on a Grassmann manifold,
Gr. For subspaces of different dimension, the setting becomes a disjoint
union of Grassmannians rather than a single manifold, and the problem is no
longer well-defined. However, natural geometric maps exist between these
manifolds with a well-defined notion of distance for the images of the
subspaces under the mappings. Solving the initial problem in this context leads
to a candidate minimax center on each of the constituent manifolds, but does
not inherently provide intuition about which candidate is the best
representation of the data. Additionally, the solutions of different rank are
generally not nested so a deflationary approach will not suffice, and the
problem must be solved independently on each manifold. We propose and solve an
optimization problem parametrized by the rank of the minimax center. The
solution is computed using a subgradient algorithm on the dual. By scaling the
objective and penalizing the information lost by the rank- minimax center,
we jointly recover an optimal dimension, , and a central subspace, Gr at the center of the minimum enclosing ball, that best
represents the data.Comment: 26 page
Beyond Linear Subspace Clustering: A Comparative Study of Nonlinear Manifold Clustering Algorithms
Subspace clustering is an important unsupervised clustering approach. It is
based on the assumption that the high-dimensional data points are approximately
distributed around several low-dimensional linear subspaces. The majority of
the prominent subspace clustering algorithms rely on the representation of the
data points as linear combinations of other data points, which is known as a
self-expressive representation. To overcome the restrictive linearity
assumption, numerous nonlinear approaches were proposed to extend successful
subspace clustering approaches to data on a union of nonlinear manifolds. In
this comparative study, we provide a comprehensive overview of nonlinear
subspace clustering approaches proposed in the last decade. We introduce a new
taxonomy to classify the state-of-the-art approaches into three categories,
namely locality preserving, kernel based, and neural network based. The major
representative algorithms within each category are extensively compared on
carefully designed synthetic and real-world data sets. The detailed analysis of
these approaches unfolds potential research directions and unsolved challenges
in this field.Comment: 55 page