26,832 research outputs found
Subspace clustering of dimensionality-reduced data
Subspace clustering refers to the problem of clustering unlabeled
high-dimensional data points into a union of low-dimensional linear subspaces,
assumed unknown. In practice one may have access to dimensionality-reduced
observations of the data only, resulting, e.g., from "undersampling" due to
complexity and speed constraints on the acquisition device. More pertinently,
even if one has access to the high-dimensional data set it is often desirable
to first project the data points into a lower-dimensional space and to perform
the clustering task there; this reduces storage requirements and computational
cost. The purpose of this paper is to quantify the impact of
dimensionality-reduction through random projection on the performance of the
sparse subspace clustering (SSC) and the thresholding based subspace clustering
(TSC) algorithms. We find that for both algorithms dimensionality reduction
down to the order of the subspace dimensions is possible without incurring
significant performance degradation. The mathematical engine behind our
theorems is a result quantifying how the affinities between subspaces change
under random dimensionality reducing projections.Comment: ISIT 201
Discriminative variable selection for clustering with the sparse Fisher-EM algorithm
The interest in variable selection for clustering has increased recently due
to the growing need in clustering high-dimensional data. Variable selection
allows in particular to ease both the clustering and the interpretation of the
results. Existing approaches have demonstrated the efficiency of variable
selection for clustering but turn out to be either very time consuming or not
sparse enough in high-dimensional spaces. This work proposes to perform a
selection of the discriminative variables by introducing sparsity in the
loading matrix of the Fisher-EM algorithm. This clustering method has been
recently proposed for the simultaneous visualization and clustering of
high-dimensional data. It is based on a latent mixture model which fits the
data into a low-dimensional discriminative subspace. Three different approaches
are proposed in this work to introduce sparsity in the orientation matrix of
the discriminative subspace through -type penalizations. Experimental
comparisons with existing approaches on simulated and real-world data sets
demonstrate the interest of the proposed methodology. An application to the
segmentation of hyperspectral images of the planet Mars is also presented
Neighborhood Selection for Thresholding-based Subspace Clustering
Subspace clustering refers to the problem of clustering high-dimensional data
points into a union of low-dimensional linear subspaces, where the number of
subspaces, their dimensions and orientations are all unknown. In this paper, we
propose a variation of the recently introduced thresholding-based subspace
clustering (TSC) algorithm, which applies spectral clustering to an adjacency
matrix constructed from the nearest neighbors of each data point with respect
to the spherical distance measure. The new element resides in an individual and
data-driven choice of the number of nearest neighbors. Previous performance
results for TSC, as well as for other subspace clustering algorithms based on
spectral clustering, come in terms of an intermediate performance measure,
which does not address the clustering error directly. Our main analytical
contribution is a performance analysis of the modified TSC algorithm (as well
as the original TSC algorithm) in terms of the clustering error directly.Comment: ICASSP 201
- …