473 research outputs found

    A Novel Efficient Approach with Data-Adaptive Capability for OMP-based Sparse Subspace Clustering

    Orthogonal Matching Pursuit (OMP) plays an important role in data science and its applications such as sparse subspace clustering and image processing. However, the existing OMP-based approaches lack of data adaptiveness so that the data cannot be represented well enough and may lose the accuracy. This paper proposes a novel approach to enhance the data-adaptive capability for OMP-based sparse subspace clustering. In our method a parameter selection process is developed to adjust the parameters based on the data distribution for information representation. Our theoretical analysis indicates that the parameter selection process can efficiently coordinate with any OMP-based methods to improve the clustering performance. Also a new Self-Expressive-Affinity (SEA) ratio metric is defined to measure the sparse representation conversion efficiency for spectral clustering to obtain data segmentations. Our experiments show that proposed approach can achieve better performances compared with other OMP-based sparse subspace clustering algorithms in terms of clustering accuracy, SEA ratio and representation quality, also keep the time efficiency and anti-noise ability

    Restricted Connection Orthogonal Matching Pursuit For Sparse Subspace Clustering

    Sparse Subspace Clustering (SSC) is one of the most popular methods for clustering data points into their underlying subspaces. However, SSC may suffer from heavy computational burden. Orthogonal Matching Pursuit applied on SSC accelerates the computation but the trade-off is the loss of clustering accuracy. In this paper, we propose a noise-robust algorithm, Restricted Connection Orthogonal Matching Pursuit for Sparse Subspace Clustering (RCOMP-SSC), to improve the clustering accuracy and maintain the low computational time by restricting the number of connections of each data point during the iteration of OMP. Also, we develop a framework of control matrix to realize RCOMP-SCC. And the framework is scalable for other data point selection strategies. Our analysis and experiments on synthetic data and two real-world databases (EYaleB & Usps) demonstrate the superiority of our algorithm compared with other clustering methods in terms of accuracy and computational time

    Evolutionary Self-Expressive Models for Subspace Clustering

    The problem of organizing data that evolves over time into clusters is encountered in a number of practical settings. We introduce evolutionary subspace clustering, a method whose objective is to cluster a collection of evolving data points that lie on a union of low-dimensional evolving subspaces. To learn the parsimonious representation of the data points at each time step, we propose a non-convex optimization framework that exploits the self-expressiveness property of the evolving data while taking into account representation from the preceding time step. To find an approximate solution to the aforementioned non-convex optimization problem, we develop a scheme based on alternating minimization that both learns the parsimonious representation as well as adaptively tunes and infers a smoothing parameter reflective of the rate of data evolution. The latter addresses a fundamental challenge in evolutionary clustering -- determining if and to what extent one should consider previous clustering solutions when analyzing an evolving data collection. Our experiments on both synthetic and real-world datasets demonstrate that the proposed framework outperforms state-of-the-art static subspace clustering algorithms and existing evolutionary clustering schemes in terms of both accuracy and running time, in a range of scenarios

    A survey of dimensionality reduction techniques

    Experimental life sciences like biology or chemistry have seen in the recent decades an explosion of the data available from experiments. Laboratory instruments become more and more complex and report hundreds or thousands measurements for a single experiment and therefore the statistical methods face challenging tasks when dealing with such high dimensional data. However, much of the data is highly redundant and can be efficiently brought down to a much smaller number of variables without a significant loss of information. The mathematical procedures making possible this reduction are called dimensionality reduction techniques; they have widely been developed by fields like Statistics or Machine Learning, and are currently a hot research topic. In this review we categorize the plethora of dimension reduction techniques available and give the mathematical insight behind them

    Spectral Sparse Representation for Clustering: Evolved from PCA, K-means, Laplacian Eigenmap, and Ratio Cut

    Dimensionality reduction, cluster analysis, and sparse representation are basic components in machine learning. However, their relationships have not yet been fully investigated. In this paper, we find that the spectral graph theory underlies a series of these elementary methods and can unify them into a complete framework. The methods include PCA, K-means, Laplacian eigenmap (LE), ratio cut (Rcut), and a new sparse representation method developed by us, called spectral sparse representation (SSR). Further, extended relations to conventional over-complete sparse representations (e.g., method of optimal directions, KSVD), manifold learning (e.g., kernel PCA, multidimensional scaling, Isomap, locally linear embedding), and subspace clustering (e.g., sparse subspace clustering, low-rank representation) are incorporated. We show that, under an ideal condition from the spectral graph theory, PCA, K-means, LE, and Rcut are unified together. And when the condition is relaxed, the unification evolves to SSR, which lies in the intermediate between PCA/LE and K-mean/Rcut. An efficient algorithm, NSCrt, is developed to solve the sparse codes of SSR. SSR combines merits of both sides: its sparse codes reduce dimensionality of data meanwhile revealing cluster structure. For its inherent relation to cluster analysis, the codes of SSR can be directly used for clustering. Scut, a clustering approach derived from SSR reaches the state-of-the-art performance in the spectral clustering family. The one-shot solution obtained by Scut is comparable to the optimal result of K-means that are run many times. Experiments on various data sets demonstrate the properties and strengths of SSR, NSCrt, and Scut

    Learning with β„“0\ell^{0}-Graph: β„“0\ell^{0}-Induced Sparse Subspace Clustering

    Sparse subspace clustering methods, such as Sparse Subspace Clustering (SSC) \cite{ElhamifarV13} and β„“1\ell^{1}-graph \cite{YanW09,ChengYYFH10}, are effective in partitioning the data that lie in a union of subspaces. Most of those methods use β„“1\ell^{1}-norm or β„“2\ell^{2}-norm with thresholding to impose the sparsity of the constructed sparse similarity graph, and certain assumptions, e.g. independence or disjointness, on the subspaces are required to obtain the subspace-sparse representation, which is the key to their success. Such assumptions are not guaranteed to hold in practice and they limit the application of sparse subspace clustering on subspaces with general location. In this paper, we propose a new sparse subspace clustering method named β„“0\ell^{0}-graph. In contrast to the required assumptions on subspaces for most existing sparse subspace clustering methods, it is proved that subspace-sparse representation can be obtained by β„“0\ell^{0}-graph for arbitrary distinct underlying subspaces almost surely under the mild i.i.d. assumption on the data generation. We develop a proximal method to obtain the sub-optimal solution to the optimization problem of β„“0\ell^{0}-graph with proved guarantee of convergence. Moreover, we propose a regularized β„“0\ell^{0}-graph that encourages nearby data to have similar neighbors so that the similarity graph is more aligned within each cluster and the graph connectivity issue is alleviated. Extensive experimental results on various data sets demonstrate the superiority of β„“0\ell^{0}-graph compared to other competing clustering methods, as well as the effectiveness of regularized β„“0\ell^{0}-graph

    Learning Self-Expression Metrics for Scalable and Inductive Subspace Clustering

    Subspace clustering has established itself as a state-of-the-art approach to clustering high-dimensional data. In particular, methods relying on the self-expressiveness property have recently proved especially successful. However, they suffer from two major shortcomings: First, a quadratic-size coefficient matrix is learned directly, preventing these methods from scaling beyond small datasets. Secondly, the trained models are transductive and thus cannot be used to cluster out-of-sample data unseen during training. Instead of learning self-expression coefficients directly, we propose a novel metric learning approach to learn instead a subspace affinity function using a siamese neural network architecture. Consequently, our model benefits from a constant number of parameters and a constant-size memory footprint, allowing it to scale to considerably larger datasets. In addition, we can formally show that out model is still able to exactly recover subspace clusters given an independence assumption. The siamese architecture in combination with a novel geometric classifier further makes our model inductive, allowing it to cluster out-of-sample data. Additionally, non-linear clusters can be detected by simply adding an auto-encoder module to the architecture. The whole model can then be trained end-to-end in a self-supervised manner. This work in progress reports promising preliminary results on the MNIST dataset. In the spirit of reproducible research, me make all code publicly available. In future work we plan to investigate several extensions of our model and to expand experimental evaluation

    Machine Learning Techniques and Applications For Ground-based Image Analysis

    Ground-based whole sky cameras have opened up new opportunities for monitoring the earth's atmosphere. These cameras are an important complement to satellite images by providing geoscientists with cheaper, faster, and more localized data. The images captured by whole sky imagers can have high spatial and temporal resolution, which is an important pre-requisite for applications such as solar energy modeling, cloud attenuation analysis, local weather prediction, etc. Extracting valuable information from the huge amount of image data by detecting and analyzing the various entities in these images is challenging. However, powerful machine learning techniques have become available to aid with the image analysis. This article provides a detailed walk-through of recent developments in these techniques and their applications in ground-based imaging. We aim to bridge the gap between computer vision and remote sensing with the help of illustrative examples. We demonstrate the advantages of using machine learning techniques in ground-based image analysis via three primary applications -- segmentation, classification, and denoising

    Discriminative Local Sparse Representations for Robust Face Recognition

    A key recent advance in face recognition models a test face image as a sparse linear combination of a set of training face images. The resulting sparse representations have been shown to possess robustness against a variety of distortions like random pixel corruption, occlusion and disguise. This approach however makes the restrictive (in many scenarios) assumption that test faces must be perfectly aligned (or registered) to the training data prior to classification. In this paper, we propose a simple yet robust local block-based sparsity model, using adaptively-constructed dictionaries from local features in the training data, to overcome this misalignment problem. Our approach is inspired by human perception: we analyze a series of local discriminative features and combine them to arrive at the final classification decision. We propose a probabilistic graphical model framework to explicitly mine the conditional dependencies between these distinct sparse local features. In particular, we learn discriminative graphs on sparse representations obtained from distinct local slices of a face. Conditional correlations between these sparse features are first discovered (in the training phase), and subsequently exploited to bring about significant improvements in recognition rates. Experimental results obtained on benchmark face databases demonstrate the effectiveness of the proposed algorithms in the presence of multiple registration errors (such as translation, rotation, and scaling) as well as under variations of pose and illumination

    Dual Principal Component Pursuit

    We consider the problem of learning a linear subspace from data corrupted by outliers. Classical approaches are typically designed for the case in which the subspace dimension is small relative to the ambient dimension. Our approach works with a dual representation of the subspace and hence aims to find its orthogonal complement; as such, it is particularly suitable for subspaces whose dimension is close to the ambient dimension (subspaces of high relative dimension). We pose the problem of computing normal vectors to the inlier subspace as a non-convex β„“1\ell_1 minimization problem on the sphere, which we call Dual Principal Component Pursuit (DPCP) problem. We provide theoretical guarantees under which every global solution to DPCP is a vector in the orthogonal complement of the inlier subspace. Moreover, we relax the non-convex DPCP problem to a recursion of linear programs whose solutions are shown to converge in a finite number of steps to a vector orthogonal to the subspace. In particular, when the inlier subspace is a hyperplane, the solutions to the recursion of linear programs converge to the global minimum of the non-convex DPCP problem in a finite number of steps. We also propose algorithms based on alternating minimization and iteratively re-weighted least squares, which are suitable for dealing with large-scale data. Experiments on synthetic data show that the proposed methods are able to handle more outliers and higher relative dimensions than current state-of-the-art methods, while experiments in the context of the three-view geometry problem in computer vision suggest that the proposed methods can be a useful or even superior alternative to traditional RANSAC-based approaches for computer vision and other applications.Comment: fixed two typos in section 7.
