5,626 research outputs found
Probabilistic Subspace Clustering via Sparse Representations
Abstract We present a probabilistic subspace clustering approach that is capable of rapidly clustering very large signal collections. The signals are modeled as drawn from a union of subspaces and each signal is represented by a sparse combination of basis elements (atoms), which form the columns of a learned dictionary. The set of sparse representations is utilized to derive the cooccurrence matrix of atoms and signals, which is modelled as emerging from a mixture model. The subspace of each signal is chosen as the one that maximizes the conditional probability of the signal given each subspace. This operation is obtained via the non-negative matrix factorization (NNMF) of the co-occurrence matrix, which exposes the conditional probability distribution of all signals. Performance evaluation demonstrate comparable clustering accuracies to state-of-theart at a fraction of the computational load
Probabilistic Sparse Subspace Clustering Using Delayed Association
Discovering and clustering subspaces in high-dimensional data is a
fundamental problem of machine learning with a wide range of applications in
data mining, computer vision, and pattern recognition. Earlier methods divided
the problem into two separate stages of finding the similarity matrix and
finding clusters. Similar to some recent works, we integrate these two steps
using a joint optimization approach. We make the following contributions: (i)
we estimate the reliability of the cluster assignment for each point before
assigning a point to a subspace. We group the data points into two groups of
"certain" and "uncertain", with the assignment of latter group delayed until
their subspace association certainty improves. (ii) We demonstrate that delayed
association is better suited for clustering subspaces that have ambiguities,
i.e. when subspaces intersect or data are contaminated with outliers/noise.
(iii) We demonstrate experimentally that such delayed probabilistic association
leads to a more accurate self-representation and final clusters. The proposed
method has higher accuracy both for points that exclusively lie in one
subspace, and those that are on the intersection of subspaces. (iv) We show
that delayed association leads to huge reduction of computational cost, since
it allows for incremental spectral clustering
Sparse Subspace Clustering: Algorithm, Theory, and Applications
In many real-world problems, we are dealing with collections of
high-dimensional data, such as images, videos, text and web documents, DNA
microarray data, and more. Often, high-dimensional data lie close to
low-dimensional structures corresponding to several classes or categories the
data belongs to. In this paper, we propose and study an algorithm, called
Sparse Subspace Clustering (SSC), to cluster data points that lie in a union of
low-dimensional subspaces. The key idea is that, among infinitely many possible
representations of a data point in terms of other points, a sparse
representation corresponds to selecting a few points from the same subspace.
This motivates solving a sparse optimization program whose solution is used in
a spectral clustering framework to infer the clustering of data into subspaces.
Since solving the sparse optimization program is in general NP-hard, we
consider a convex relaxation and show that, under appropriate conditions on the
arrangement of subspaces and the distribution of data, the proposed
minimization program succeeds in recovering the desired sparse representations.
The proposed algorithm can be solved efficiently and can handle data points
near the intersections of subspaces. Another key advantage of the proposed
algorithm with respect to the state of the art is that it can deal with data
nuisances, such as noise, sparse outlying entries, and missing entries,
directly by incorporating the model of the data into the sparse optimization
program. We demonstrate the effectiveness of the proposed algorithm through
experiments on synthetic data as well as the two real-world problems of motion
segmentation and face clustering
A Study on Clustering for Clustering Based Image De-Noising
In this paper, the problem of de-noising of an image contaminated with
Additive White Gaussian Noise (AWGN) is studied. This subject is an open
problem in signal processing for more than 50 years. Local methods suggested in
recent years, have obtained better results than global methods. However by more
intelligent training in such a way that first, important data is more effective
for training, second, clustering in such way that training blocks lie in
low-rank subspaces, we can design a dictionary applicable for image de-noising
and obtain results near the state of the art local methods. In the present
paper, we suggest a method based on global clustering of image constructing
blocks. As the type of clustering plays an important role in clustering-based
de-noising methods, we address two questions about the clustering. The first,
which parts of the data should be considered for clustering? and the second,
what data clustering method is suitable for de-noising.? Then clustering is
exploited to learn an over complete dictionary. By obtaining sparse
decomposition of the noisy image blocks in terms of the dictionary atoms, the
de-noised version is achieved. In addition to our framework, 7 popular
dictionary learning methods are simulated and compared. The results are
compared based on two major factors: (1) de-noising performance and (2)
execution time. Experimental results show that our dictionary learning
framework outperforms its competitors in terms of both factors.Comment: 9 pages, 8 figures, Journal of Information Systems and
Telecommunications (JIST
Distributed Low-rank Subspace Segmentation
Vision problems ranging from image clustering to motion segmentation to
semi-supervised learning can naturally be framed as subspace segmentation
problems, in which one aims to recover multiple low-dimensional subspaces from
noisy and corrupted input data. Low-Rank Representation (LRR), a convex
formulation of the subspace segmentation problem, is provably and empirically
accurate on small problems but does not scale to the massive sizes of modern
vision datasets. Moreover, past work aimed at scaling up low-rank matrix
factorization is not applicable to LRR given its non-decomposable constraints.
In this work, we propose a novel divide-and-conquer algorithm for large-scale
subspace segmentation that can cope with LRR's non-decomposable constraints and
maintains LRR's strong recovery guarantees. This has immediate implications for
the scalability of subspace segmentation, which we demonstrate on a benchmark
face recognition dataset and in simulations. We then introduce novel
applications of LRR-based subspace segmentation to large-scale semi-supervised
learning for multimedia event detection, concept detection, and image tagging.
In each case, we obtain state-of-the-art results and order-of-magnitude speed
ups
- …