839 research outputs found
Factoring nonnegative matrices with linear programs
This paper describes a new approach, based on linear programming, for
computing nonnegative matrix factorizations (NMFs). The key idea is a
data-driven model for the factorization where the most salient features in the
data are used to express the remaining features. More precisely, given a data
matrix X, the algorithm identifies a matrix C such that X approximately equals
CX and some linear constraints. The constraints are chosen to ensure that the
matrix C selects features; these features can then be used to find a low-rank
NMF of X. A theoretical analysis demonstrates that this approach has guarantees
similar to those of the recent NMF algorithm of Arora et al. (2012). In
contrast with this earlier work, the proposed method extends to more general
noise models and leads to efficient, scalable algorithms. Experiments with
synthetic and real datasets provide evidence that the new approach is also
superior in practice. An optimized C++ implementation can factor a
multigigabyte matrix in a matter of minutes.Comment: 17 pages, 10 figures. Modified theorem statement for robust recovery
conditions. Revised proof techniques to make arguments more elementary.
Results on robustness when rows are duplicated have been superseded by
arxiv.org/1211.668
Robustness Analysis of Hottopixx, a Linear Programming Model for Factoring Nonnegative Matrices
Although nonnegative matrix factorization (NMF) is NP-hard in general, it has
been shown very recently that it is tractable under the assumption that the
input nonnegative data matrix is close to being separable (separability
requires that all columns of the input matrix belongs to the cone spanned by a
small subset of these columns). Since then, several algorithms have been
designed to handle this subclass of NMF problems. In particular, Bittorf,
Recht, R\'e and Tropp (`Factoring nonnegative matrices with linear programs',
NIPS 2012) proposed a linear programming model, referred to as Hottopixx. In
this paper, we provide a new and more general robustness analysis of their
method. In particular, we design a provably more robust variant using a
post-processing strategy which allows us to deal with duplicates and near
duplicates in the dataset.Comment: 23 pages; new numerical results; Comparison with Arora et al.;
Accepted in SIAM J. Mat. Anal. App
Robust Near-Separable Nonnegative Matrix Factorization Using Linear Optimization
Nonnegative matrix factorization (NMF) has been shown recently to be
tractable under the separability assumption, under which all the columns of the
input data matrix belong to the convex cone generated by only a few of these
columns. Bittorf, Recht, R\'e and Tropp (`Factoring nonnegative matrices with
linear programs', NIPS 2012) proposed a linear programming (LP) model, referred
to as Hottopixx, which is robust under any small perturbation of the input
matrix. However, Hottopixx has two important drawbacks: (i) the input matrix
has to be normalized, and (ii) the factorization rank has to be known in
advance. In this paper, we generalize Hottopixx in order to resolve these two
drawbacks, that is, we propose a new LP model which does not require
normalization and detects the factorization rank automatically. Moreover, the
new LP model is more flexible, significantly more tolerant to noise, and can
easily be adapted to handle outliers and other noise models. Finally, we show
on several synthetic datasets that it outperforms Hottopixx while competing
favorably with two state-of-the-art methods.Comment: 27 page; 4 figures. New Example, new experiment on the Swimmer data
se
An upper bound for nonnegative rank
We provide a nontrivial upper bound for the nonnegative rank of rank-three
matrices, which allows us to prove that [6(n+1)/7] linear inequalities suffice
to describe a convex n-gon up to a linear projection
Dimension Reduction via Colour Refinement
Colour refinement is a basic algorithmic routine for graph isomorphism
testing, appearing as a subroutine in almost all practical isomorphism solvers.
It partitions the vertices of a graph into "colour classes" in such a way that
all vertices in the same colour class have the same number of neighbours in
every colour class. Tinhofer (Disc. App. Math., 1991), Ramana, Scheinerman, and
Ullman (Disc. Math., 1994) and Godsil (Lin. Alg. and its App., 1997)
established a tight correspondence between colour refinement and fractional
isomorphisms of graphs, which are solutions to the LP relaxation of a natural
ILP formulation of graph isomorphism.
We introduce a version of colour refinement for matrices and extend existing
quasilinear algorithms for computing the colour classes. Then we generalise the
correspondence between colour refinement and fractional automorphisms and
develop a theory of fractional automorphisms and isomorphisms of matrices.
We apply our results to reduce the dimensions of systems of linear equations
and linear programs. Specifically, we show that any given LP L can efficiently
be transformed into a (potentially) smaller LP L' whose number of variables and
constraints is the number of colour classes of the colour refinement algorithm,
applied to a matrix associated with the LP. The transformation is such that we
can easily (by a linear mapping) map both feasible and optimal solutions back
and forth between the two LPs. We demonstrate empirically that colour
refinement can indeed greatly reduce the cost of solving linear programs
- …