473 research outputs found
A Novel Efficient Approach with Data-Adaptive Capability for OMP-based Sparse Subspace Clustering
Orthogonal Matching Pursuit (OMP) plays an important role in data science and
its applications such as sparse subspace clustering and image processing.
However, the existing OMP-based approaches lack of data adaptiveness so that
the data cannot be represented well enough and may lose the accuracy. This
paper proposes a novel approach to enhance the data-adaptive capability for
OMP-based sparse subspace clustering. In our method a parameter selection
process is developed to adjust the parameters based on the data distribution
for information representation. Our theoretical analysis indicates that the
parameter selection process can efficiently coordinate with any OMP-based
methods to improve the clustering performance. Also a new
Self-Expressive-Affinity (SEA) ratio metric is defined to measure the sparse
representation conversion efficiency for spectral clustering to obtain data
segmentations. Our experiments show that proposed approach can achieve better
performances compared with other OMP-based sparse subspace clustering
algorithms in terms of clustering accuracy, SEA ratio and representation
quality, also keep the time efficiency and anti-noise ability
Restricted Connection Orthogonal Matching Pursuit For Sparse Subspace Clustering
Sparse Subspace Clustering (SSC) is one of the most popular methods for
clustering data points into their underlying subspaces. However, SSC may suffer
from heavy computational burden. Orthogonal Matching Pursuit applied on SSC
accelerates the computation but the trade-off is the loss of clustering
accuracy. In this paper, we propose a noise-robust algorithm, Restricted
Connection Orthogonal Matching Pursuit for Sparse Subspace Clustering
(RCOMP-SSC), to improve the clustering accuracy and maintain the low
computational time by restricting the number of connections of each data point
during the iteration of OMP. Also, we develop a framework of control matrix to
realize RCOMP-SCC. And the framework is scalable for other data point selection
strategies. Our analysis and experiments on synthetic data and two real-world
databases (EYaleB & Usps) demonstrate the superiority of our algorithm compared
with other clustering methods in terms of accuracy and computational time
Evolutionary Self-Expressive Models for Subspace Clustering
The problem of organizing data that evolves over time into clusters is
encountered in a number of practical settings. We introduce evolutionary
subspace clustering, a method whose objective is to cluster a collection of
evolving data points that lie on a union of low-dimensional evolving subspaces.
To learn the parsimonious representation of the data points at each time step,
we propose a non-convex optimization framework that exploits the
self-expressiveness property of the evolving data while taking into account
representation from the preceding time step. To find an approximate solution to
the aforementioned non-convex optimization problem, we develop a scheme based
on alternating minimization that both learns the parsimonious representation as
well as adaptively tunes and infers a smoothing parameter reflective of the
rate of data evolution. The latter addresses a fundamental challenge in
evolutionary clustering -- determining if and to what extent one should
consider previous clustering solutions when analyzing an evolving data
collection. Our experiments on both synthetic and real-world datasets
demonstrate that the proposed framework outperforms state-of-the-art static
subspace clustering algorithms and existing evolutionary clustering schemes in
terms of both accuracy and running time, in a range of scenarios
A survey of dimensionality reduction techniques
Experimental life sciences like biology or chemistry have seen in the recent
decades an explosion of the data available from experiments. Laboratory
instruments become more and more complex and report hundreds or thousands
measurements for a single experiment and therefore the statistical methods face
challenging tasks when dealing with such high dimensional data. However, much
of the data is highly redundant and can be efficiently brought down to a much
smaller number of variables without a significant loss of information. The
mathematical procedures making possible this reduction are called
dimensionality reduction techniques; they have widely been developed by fields
like Statistics or Machine Learning, and are currently a hot research topic. In
this review we categorize the plethora of dimension reduction techniques
available and give the mathematical insight behind them
Spectral Sparse Representation for Clustering: Evolved from PCA, K-means, Laplacian Eigenmap, and Ratio Cut
Dimensionality reduction, cluster analysis, and sparse representation are
basic components in machine learning. However, their relationships have not yet
been fully investigated. In this paper, we find that the spectral graph theory
underlies a series of these elementary methods and can unify them into a
complete framework. The methods include PCA, K-means, Laplacian eigenmap (LE),
ratio cut (Rcut), and a new sparse representation method developed by us,
called spectral sparse representation (SSR). Further, extended relations to
conventional over-complete sparse representations (e.g., method of optimal
directions, KSVD), manifold learning (e.g., kernel PCA, multidimensional
scaling, Isomap, locally linear embedding), and subspace clustering (e.g.,
sparse subspace clustering, low-rank representation) are incorporated. We show
that, under an ideal condition from the spectral graph theory, PCA, K-means,
LE, and Rcut are unified together. And when the condition is relaxed, the
unification evolves to SSR, which lies in the intermediate between PCA/LE and
K-mean/Rcut. An efficient algorithm, NSCrt, is developed to solve the sparse
codes of SSR. SSR combines merits of both sides: its sparse codes reduce
dimensionality of data meanwhile revealing cluster structure. For its inherent
relation to cluster analysis, the codes of SSR can be directly used for
clustering. Scut, a clustering approach derived from SSR reaches the
state-of-the-art performance in the spectral clustering family. The one-shot
solution obtained by Scut is comparable to the optimal result of K-means that
are run many times. Experiments on various data sets demonstrate the properties
and strengths of SSR, NSCrt, and Scut
Learning with -Graph: -Induced Sparse Subspace Clustering
Sparse subspace clustering methods, such as Sparse Subspace Clustering (SSC)
\cite{ElhamifarV13} and -graph \cite{YanW09,ChengYYFH10}, are
effective in partitioning the data that lie in a union of subspaces. Most of
those methods use -norm or -norm with thresholding to
impose the sparsity of the constructed sparse similarity graph, and certain
assumptions, e.g. independence or disjointness, on the subspaces are required
to obtain the subspace-sparse representation, which is the key to their
success. Such assumptions are not guaranteed to hold in practice and they limit
the application of sparse subspace clustering on subspaces with general
location. In this paper, we propose a new sparse subspace clustering method
named -graph. In contrast to the required assumptions on subspaces
for most existing sparse subspace clustering methods, it is proved that
subspace-sparse representation can be obtained by -graph for
arbitrary distinct underlying subspaces almost surely under the mild i.i.d.
assumption on the data generation. We develop a proximal method to obtain the
sub-optimal solution to the optimization problem of -graph with
proved guarantee of convergence. Moreover, we propose a regularized
-graph that encourages nearby data to have similar neighbors so that
the similarity graph is more aligned within each cluster and the graph
connectivity issue is alleviated. Extensive experimental results on various
data sets demonstrate the superiority of -graph compared to other
competing clustering methods, as well as the effectiveness of regularized
-graph
Learning Self-Expression Metrics for Scalable and Inductive Subspace Clustering
Subspace clustering has established itself as a state-of-the-art approach to
clustering high-dimensional data. In particular, methods relying on the
self-expressiveness property have recently proved especially successful.
However, they suffer from two major shortcomings: First, a quadratic-size
coefficient matrix is learned directly, preventing these methods from scaling
beyond small datasets. Secondly, the trained models are transductive and thus
cannot be used to cluster out-of-sample data unseen during training. Instead of
learning self-expression coefficients directly, we propose a novel metric
learning approach to learn instead a subspace affinity function using a siamese
neural network architecture. Consequently, our model benefits from a constant
number of parameters and a constant-size memory footprint, allowing it to scale
to considerably larger datasets. In addition, we can formally show that out
model is still able to exactly recover subspace clusters given an independence
assumption. The siamese architecture in combination with a novel geometric
classifier further makes our model inductive, allowing it to cluster
out-of-sample data. Additionally, non-linear clusters can be detected by simply
adding an auto-encoder module to the architecture. The whole model can then be
trained end-to-end in a self-supervised manner. This work in progress reports
promising preliminary results on the MNIST dataset. In the spirit of
reproducible research, me make all code publicly available. In future work we
plan to investigate several extensions of our model and to expand experimental
evaluation
Machine Learning Techniques and Applications For Ground-based Image Analysis
Ground-based whole sky cameras have opened up new opportunities for
monitoring the earth's atmosphere. These cameras are an important complement to
satellite images by providing geoscientists with cheaper, faster, and more
localized data. The images captured by whole sky imagers can have high spatial
and temporal resolution, which is an important pre-requisite for applications
such as solar energy modeling, cloud attenuation analysis, local weather
prediction, etc.
Extracting valuable information from the huge amount of image data by
detecting and analyzing the various entities in these images is challenging.
However, powerful machine learning techniques have become available to aid with
the image analysis. This article provides a detailed walk-through of recent
developments in these techniques and their applications in ground-based
imaging. We aim to bridge the gap between computer vision and remote sensing
with the help of illustrative examples. We demonstrate the advantages of using
machine learning techniques in ground-based image analysis via three primary
applications -- segmentation, classification, and denoising
Discriminative Local Sparse Representations for Robust Face Recognition
A key recent advance in face recognition models a test face image as a sparse
linear combination of a set of training face images. The resulting sparse
representations have been shown to possess robustness against a variety of
distortions like random pixel corruption, occlusion and disguise. This approach
however makes the restrictive (in many scenarios) assumption that test faces
must be perfectly aligned (or registered) to the training data prior to
classification. In this paper, we propose a simple yet robust local block-based
sparsity model, using adaptively-constructed dictionaries from local features
in the training data, to overcome this misalignment problem. Our approach is
inspired by human perception: we analyze a series of local discriminative
features and combine them to arrive at the final classification decision. We
propose a probabilistic graphical model framework to explicitly mine the
conditional dependencies between these distinct sparse local features. In
particular, we learn discriminative graphs on sparse representations obtained
from distinct local slices of a face. Conditional correlations between these
sparse features are first discovered (in the training phase), and subsequently
exploited to bring about significant improvements in recognition rates.
Experimental results obtained on benchmark face databases demonstrate the
effectiveness of the proposed algorithms in the presence of multiple
registration errors (such as translation, rotation, and scaling) as well as
under variations of pose and illumination
Dual Principal Component Pursuit
We consider the problem of learning a linear subspace from data corrupted by
outliers. Classical approaches are typically designed for the case in which the
subspace dimension is small relative to the ambient dimension. Our approach
works with a dual representation of the subspace and hence aims to find its
orthogonal complement; as such, it is particularly suitable for subspaces whose
dimension is close to the ambient dimension (subspaces of high relative
dimension). We pose the problem of computing normal vectors to the inlier
subspace as a non-convex minimization problem on the sphere, which we
call Dual Principal Component Pursuit (DPCP) problem. We provide theoretical
guarantees under which every global solution to DPCP is a vector in the
orthogonal complement of the inlier subspace. Moreover, we relax the non-convex
DPCP problem to a recursion of linear programs whose solutions are shown to
converge in a finite number of steps to a vector orthogonal to the subspace. In
particular, when the inlier subspace is a hyperplane, the solutions to the
recursion of linear programs converge to the global minimum of the non-convex
DPCP problem in a finite number of steps. We also propose algorithms based on
alternating minimization and iteratively re-weighted least squares, which are
suitable for dealing with large-scale data. Experiments on synthetic data show
that the proposed methods are able to handle more outliers and higher relative
dimensions than current state-of-the-art methods, while experiments in the
context of the three-view geometry problem in computer vision suggest that the
proposed methods can be a useful or even superior alternative to traditional
RANSAC-based approaches for computer vision and other applications.Comment: fixed two typos in section 7.
- β¦