543 research outputs found
CUR Decompositions, Similarity Matrices, and Subspace Clustering
A general framework for solving the subspace clustering problem using the CUR
decomposition is presented. The CUR decomposition provides a natural way to
construct similarity matrices for data that come from a union of unknown
subspaces . The similarity
matrices thus constructed give the exact clustering in the noise-free case.
Additionally, this decomposition gives rise to many distinct similarity
matrices from a given set of data, which allow enough flexibility to perform
accurate clustering of noisy data. We also show that two known methods for
subspace clustering can be derived from the CUR decomposition. An algorithm
based on the theoretical construction of similarity matrices is presented, and
experiments on synthetic and real data are presented to test the method.
Additionally, an adaptation of our CUR based similarity matrices is utilized
to provide a heuristic algorithm for subspace clustering; this algorithm yields
the best overall performance to date for clustering the Hopkins155 motion
segmentation dataset.Comment: Approximately 30 pages. Current version contains improved algorithm
and numerical experiments from the previous versio
Generalized Separable Nonnegative Matrix Factorization
Nonnegative matrix factorization (NMF) is a linear dimensionality technique
for nonnegative data with applications such as image analysis, text mining,
audio source separation and hyperspectral unmixing. Given a data matrix and
a factorization rank , NMF looks for a nonnegative matrix with
columns and a nonnegative matrix with rows such that .
NMF is NP-hard to solve in general. However, it can be computed efficiently
under the separability assumption which requires that the basis vectors appear
as data points, that is, that there exists an index set such that
. In this paper, we generalize the separability
assumption: We only require that for each rank-one factor for
, either for some or for
some . We refer to the corresponding problem as generalized separable NMF
(GS-NMF). We discuss some properties of GS-NMF and propose a convex
optimization model which we solve using a fast gradient method. We also propose
a heuristic algorithm inspired by the successive projection algorithm. To
verify the effectiveness of our methods, we compare them with several
state-of-the-art separable NMF algorithms on synthetic, document and image data
sets.Comment: 31 pages, 12 figures, 4 tables. We have added discussions about the
identifiability of the model, we have modified the first synthetic
experiment, we have clarified some aspects of the contributio
A Distance-preserving Matrix Sketch
Visualizing very large matrices involves many formidable problems. Various
popular solutions to these problems involve sampling, clustering, projection,
or feature selection to reduce the size and complexity of the original task. An
important aspect of these methods is how to preserve relative distances between
points in the higher-dimensional space after reducing rows and columns to fit
in a lower dimensional space. This aspect is important because conclusions
based on faulty visual reasoning can be harmful. Judging dissimilar points as
similar or similar points as dissimilar on the basis of a visualization can
lead to false conclusions. To ameliorate this bias and to make visualizations
of very large datasets feasible, we introduce two new algorithms that
respectively select a subset of rows and columns of a rectangular matrix. This
selection is designed to preserve relative distances as closely as possible. We
compare our matrix sketch to more traditional alternatives on a variety of
artificial and real datasets.Comment: 38 pages, 13 figure
- …