1,092 research outputs found
Robust Near-Separable Nonnegative Matrix Factorization Using Linear Optimization
Nonnegative matrix factorization (NMF) has been shown recently to be
tractable under the separability assumption, under which all the columns of the
input data matrix belong to the convex cone generated by only a few of these
columns. Bittorf, Recht, R\'e and Tropp (`Factoring nonnegative matrices with
linear programs', NIPS 2012) proposed a linear programming (LP) model, referred
to as Hottopixx, which is robust under any small perturbation of the input
matrix. However, Hottopixx has two important drawbacks: (i) the input matrix
has to be normalized, and (ii) the factorization rank has to be known in
advance. In this paper, we generalize Hottopixx in order to resolve these two
drawbacks, that is, we propose a new LP model which does not require
normalization and detects the factorization rank automatically. Moreover, the
new LP model is more flexible, significantly more tolerant to noise, and can
easily be adapted to handle outliers and other noise models. Finally, we show
on several synthetic datasets that it outperforms Hottopixx while competing
favorably with two state-of-the-art methods.Comment: 27 page; 4 figures. New Example, new experiment on the Swimmer data
se
A Fast Gradient Method for Nonnegative Sparse Regression with Self Dictionary
A nonnegative matrix factorization (NMF) can be computed efficiently under
the separability assumption, which asserts that all the columns of the given
input data matrix belong to the cone generated by a (small) subset of them. The
provably most robust methods to identify these conic basis columns are based on
nonnegative sparse regression and self dictionaries, and require the solution
of large-scale convex optimization problems. In this paper we study a
particular nonnegative sparse regression model with self dictionary. As opposed
to previously proposed models, this model yields a smooth optimization problem
where the sparsity is enforced through linear constraints. We show that the
Euclidean projection on the polyhedron defined by these constraints can be
computed efficiently, and propose a fast gradient method to solve our model. We
compare our algorithm with several state-of-the-art methods on synthetic data
sets and real-world hyperspectral images
Robustness Analysis of Hottopixx, a Linear Programming Model for Factoring Nonnegative Matrices
Although nonnegative matrix factorization (NMF) is NP-hard in general, it has
been shown very recently that it is tractable under the assumption that the
input nonnegative data matrix is close to being separable (separability
requires that all columns of the input matrix belongs to the cone spanned by a
small subset of these columns). Since then, several algorithms have been
designed to handle this subclass of NMF problems. In particular, Bittorf,
Recht, R\'e and Tropp (`Factoring nonnegative matrices with linear programs',
NIPS 2012) proposed a linear programming model, referred to as Hottopixx. In
this paper, we provide a new and more general robustness analysis of their
method. In particular, we design a provably more robust variant using a
post-processing strategy which allows us to deal with duplicates and near
duplicates in the dataset.Comment: 23 pages; new numerical results; Comparison with Arora et al.;
Accepted in SIAM J. Mat. Anal. App
Factoring nonnegative matrices with linear programs
This paper describes a new approach, based on linear programming, for
computing nonnegative matrix factorizations (NMFs). The key idea is a
data-driven model for the factorization where the most salient features in the
data are used to express the remaining features. More precisely, given a data
matrix X, the algorithm identifies a matrix C such that X approximately equals
CX and some linear constraints. The constraints are chosen to ensure that the
matrix C selects features; these features can then be used to find a low-rank
NMF of X. A theoretical analysis demonstrates that this approach has guarantees
similar to those of the recent NMF algorithm of Arora et al. (2012). In
contrast with this earlier work, the proposed method extends to more general
noise models and leads to efficient, scalable algorithms. Experiments with
synthetic and real datasets provide evidence that the new approach is also
superior in practice. An optimized C++ implementation can factor a
multigigabyte matrix in a matter of minutes.Comment: 17 pages, 10 figures. Modified theorem statement for robust recovery
conditions. Revised proof techniques to make arguments more elementary.
Results on robustness when rows are duplicated have been superseded by
arxiv.org/1211.668
Generalized Separable Nonnegative Matrix Factorization
Nonnegative matrix factorization (NMF) is a linear dimensionality technique
for nonnegative data with applications such as image analysis, text mining,
audio source separation and hyperspectral unmixing. Given a data matrix and
a factorization rank , NMF looks for a nonnegative matrix with
columns and a nonnegative matrix with rows such that .
NMF is NP-hard to solve in general. However, it can be computed efficiently
under the separability assumption which requires that the basis vectors appear
as data points, that is, that there exists an index set such that
. In this paper, we generalize the separability
assumption: We only require that for each rank-one factor for
, either for some or for
some . We refer to the corresponding problem as generalized separable NMF
(GS-NMF). We discuss some properties of GS-NMF and propose a convex
optimization model which we solve using a fast gradient method. We also propose
a heuristic algorithm inspired by the successive projection algorithm. To
verify the effectiveness of our methods, we compare them with several
state-of-the-art separable NMF algorithms on synthetic, document and image data
sets.Comment: 31 pages, 12 figures, 4 tables. We have added discussions about the
identifiability of the model, we have modified the first synthetic
experiment, we have clarified some aspects of the contributio
- …