5 research outputs found

    Scalable and Robust Sparse Subspace Clustering Using Randomized Clustering and Multilayer Graphs

    Full text link
    Sparse subspace clustering (SSC) is one of the current state-of-the-art methods for partitioning data points into the union of subspaces, with strong theoretical guarantees. However, it is not practical for large data sets as it requires solving a LASSO problem for each data point, where the number of variables in each LASSO problem is the number of data points. To improve the scalability of SSC, we propose to select a few sets of anchor points using a randomized hierarchical clustering method, and, for each set of anchor points, solve the LASSO problems for each data point allowing only anchor points to have a non-zero weight (this reduces drastically the number of variables). This generates a multilayer graph where each layer corresponds to a different set of anchor points. Using the Grassmann manifold of orthogonal matrices, the shared connectivity among the layers is summarized within a single subspace. Finally, we use kk-means clustering within that subspace to cluster the data points, similarly as done by spectral clustering in SSC. We show on both synthetic and real-world data sets that the proposed method not only allows SSC to scale to large-scale data sets, but that it is also much more robust as it performs significantly better on noisy data and on data with close susbspaces and outliers, while it is not prone to oversegmentation.Comment: 25 pages, v2: typos correcte

    Efficient Solvers for Sparse Subspace Clustering

    Full text link
    Sparse subspace clustering (SSC) clusters nn points that lie near a union of low-dimensional subspaces. The SSC model expresses each point as a linear or affine combination of the other points, using either β„“1\ell_1 or β„“0\ell_0 regularization. Using β„“1\ell_1 regularization results in a convex problem but requires O(n2)O(n^2) storage, and is typically solved by the alternating direction method of multipliers which takes O(n3)O(n^3) flops. The β„“0\ell_0 model is non-convex but only needs memory linear in nn, and is solved via orthogonal matching pursuit and cannot handle the case of affine subspaces. This paper shows that a proximal gradient framework can solve SSC, covering both β„“1\ell_1 and β„“0\ell_0 models, and both linear and affine constraints. For both β„“1\ell_1 and β„“0\ell_0, algorithms to compute the proximity operator in the presence of affine constraints have not been presented in the SSC literature, so we derive an exact and efficient algorithm that solves the β„“1\ell_1 case with just O(n2)O(n^2) flops. In the β„“0\ell_0 case, our algorithm retains the low-memory overhead, and is the first algorithm to solve the SSC-β„“0\ell_0 model with affine constraints. Experiments show our algorithms do not rely on sensitive regularization parameters, and they are less sensitive to sparsity misspecification and high noise.Comment: This paper is accepted for publication in Signal Processin

    Exactly Robust Kernel Principal Component Analysis

    Full text link
    Robust principal component analysis (RPCA) can recover low-rank matrices when they are corrupted by sparse noises. In practice, many matrices are, however, of high-rank and hence cannot be recovered by RPCA. We propose a novel method called robust kernel principal component analysis (RKPCA) to decompose a partially corrupted matrix as a sparse matrix plus a high or full-rank matrix with low latent dimensionality. RKPCA can be applied to many problems such as noise removal and subspace clustering and is still the only unsupervised nonlinear method robust to sparse noises. Our theoretical analysis shows that, with high probability, RKPCA can provide high recovery accuracy. The optimization of RKPCA involves nonconvex and indifferentiable problems. We propose two nonconvex optimization algorithms for RKPCA. They are alternating direction method of multipliers with backtracking line search and proximal linearized minimization with adaptive step size. Comparative studies in noise removal and robust subspace clustering corroborate the effectiveness and superiority of RKPCA.Comment: The paper was accepted by IEEE Transactions on Neural Networks and Learning System

    On a minimum enclosing ball of a collection of linear subspaces

    Full text link
    This paper concerns the minimax center of a collection of linear subspaces. When the subspaces are kk-dimensional subspaces of Rn\mathbb{R}^n, this can be cast as finding the center of a minimum enclosing ball on a Grassmann manifold, Gr(k,n)(k,n). For subspaces of different dimension, the setting becomes a disjoint union of Grassmannians rather than a single manifold, and the problem is no longer well-defined. However, natural geometric maps exist between these manifolds with a well-defined notion of distance for the images of the subspaces under the mappings. Solving the initial problem in this context leads to a candidate minimax center on each of the constituent manifolds, but does not inherently provide intuition about which candidate is the best representation of the data. Additionally, the solutions of different rank are generally not nested so a deflationary approach will not suffice, and the problem must be solved independently on each manifold. We propose and solve an optimization problem parametrized by the rank of the minimax center. The solution is computed using a subgradient algorithm on the dual. By scaling the objective and penalizing the information lost by the rank-kk minimax center, we jointly recover an optimal dimension, kβˆ—k^*, and a central subspace, Uβˆ—βˆˆU^* \in Gr(kβˆ—,n)(k^*,n) at the center of the minimum enclosing ball, that best represents the data.Comment: 26 page

    Beyond Linear Subspace Clustering: A Comparative Study of Nonlinear Manifold Clustering Algorithms

    Full text link
    Subspace clustering is an important unsupervised clustering approach. It is based on the assumption that the high-dimensional data points are approximately distributed around several low-dimensional linear subspaces. The majority of the prominent subspace clustering algorithms rely on the representation of the data points as linear combinations of other data points, which is known as a self-expressive representation. To overcome the restrictive linearity assumption, numerous nonlinear approaches were proposed to extend successful subspace clustering approaches to data on a union of nonlinear manifolds. In this comparative study, we provide a comprehensive overview of nonlinear subspace clustering approaches proposed in the last decade. We introduce a new taxonomy to classify the state-of-the-art approaches into three categories, namely locality preserving, kernel based, and neural network based. The major representative algorithms within each category are extensively compared on carefully designed synthetic and real-world data sets. The detailed analysis of these approaches unfolds potential research directions and unsolved challenges in this field.Comment: 55 page
    corecore