1,835 research outputs found
Coordinate Descent Methods for Symmetric Nonnegative Matrix Factorization
Given a symmetric nonnegative matrix , symmetric nonnegative matrix
factorization (symNMF) is the problem of finding a nonnegative matrix ,
usually with much fewer columns than , such that . SymNMF
can be used for data analysis and in particular for various clustering tasks.
In this paper, we propose simple and very efficient coordinate descent schemes
to solve this problem, and that can handle large and sparse input matrices. The
effectiveness of our methods is illustrated on synthetic and real-world data
sets, and we show that they perform favorably compared to recent
state-of-the-art methods.Comment: 25 pages, 5 figures, 7 tables. Main changes: comparison with another
symNMF algorithm (namely, BetaSNMF), and correction of an error in the
convergence proo
Dropping Symmetry for Fast Symmetric Nonnegative Matrix Factorization
Symmetric nonnegative matrix factorization (NMF), a special but important
class of the general NMF, is demonstrated to be useful for data analysis and in
particular for various clustering tasks. Unfortunately, designing fast
algorithms for Symmetric NMF is not as easy as for the nonsymmetric
counterpart, the latter admitting the splitting property that allows efficient
alternating-type algorithms. To overcome this issue, we transfer the symmetric
NMF to a nonsymmetric one, then we can adopt the idea from the state-of-the-art
algorithms for nonsymmetric NMF to design fast algorithms solving symmetric
NMF. We rigorously establish that solving nonsymmetric reformulation returns a
solution for symmetric NMF and then apply fast alternating based algorithms for
the corresponding reformulated problem. Furthermore, we show these fast
algorithms admit strong convergence guarantee in the sense that the generated
sequence is convergent at least at a sublinear rate and it converges globally
to a critical point of the symmetric NMF. We conduct experiments on both
synthetic data and image clustering to support our result.Comment: Accepted in NIPS 201
Algorithms for Positive Semidefinite Factorization
This paper considers the problem of positive semidefinite factorization (PSD
factorization), a generalization of exact nonnegative matrix factorization.
Given an -by- nonnegative matrix and an integer , the PSD
factorization problem consists in finding, if possible, symmetric -by-
positive semidefinite matrices and such
that for , and . PSD
factorization is NP-hard. In this work, we introduce several local optimization
schemes to tackle this problem: a fast projected gradient method and two
algorithms based on the coordinate descent framework. The main application of
PSD factorization is the computation of semidefinite extensions, that is, the
representations of polyhedrons as projections of spectrahedra, for which the
matrix to be factorized is the slack matrix of the polyhedron. We compare the
performance of our algorithms on this class of problems. In particular, we
compute the PSD extensions of size for the
regular -gons when , and . We also show how to generalize our
algorithms to compute the square root rank (which is the size of the factors in
a PSD factorization where all factor matrices and have rank one)
and completely PSD factorizations (which is the special case where the input
matrix is symmetric and equality is required for all ).Comment: 21 pages, 3 figures, 3 table
Nonnegative Matrix Factorization for Signal and Data Analytics: Identifiability, Algorithms, and Applications
Nonnegative matrix factorization (NMF) has become a workhorse for signal and
data analytics, triggered by its model parsimony and interpretability. Perhaps
a bit surprisingly, the understanding to its model identifiability---the major
reason behind the interpretability in many applications such as topic mining
and hyperspectral imaging---had been rather limited until recent years.
Beginning from the 2010s, the identifiability research of NMF has progressed
considerably: Many interesting and important results have been discovered by
the signal processing (SP) and machine learning (ML) communities. NMF
identifiability has a great impact on many aspects in practice, such as
ill-posed formulation avoidance and performance-guaranteed algorithm design. On
the other hand, there is no tutorial paper that introduces NMF from an
identifiability viewpoint. In this paper, we aim at filling this gap by
offering a comprehensive and deep tutorial on model identifiability of NMF as
well as the connections to algorithms and applications. This tutorial will help
researchers and graduate students grasp the essence and insights of NMF,
thereby avoiding typical `pitfalls' that are often times due to unidentifiable
NMF formulations. This paper will also help practitioners pick/design suitable
factorization tools for their own problems.Comment: accepted version, IEEE Signal Processing Magazine; supplementary
materials added. Some minor revisions implemente
A Symmetric Rank-one Quasi Newton Method for Non-negative Matrix Factorization
As we all known, the nonnegative matrix factorization (NMF) is a dimension
reduction method that has been widely used in image processing, text
compressing and signal processing etc. In this paper, an algorithm for
nonnegative matrix approximation is proposed. This method mainly bases on the
active set and the quasi-Newton type algorithm, by using the symmetric rank-one
and negative curvature direction technologies to approximate the Hessian
matrix. Our method improves the recent results of those methods in [Pattern
Recognition, 45(2012)3557-3565; SIAM J. Sci. Comput., 33(6)(2011)3261-3281;
Neural Computation, 19(10)(2007)2756-2779, etc.]. Moreover, the object function
decreases faster than many other NMF methods. In addition, some numerical
experiments are presented in the synthetic data, imaging processing and text
clustering. By comparing with the other six nonnegative matrix approximation
methods, our experiments confirm to our analysis.Comment: 19 pages, 13 figures, Submitted to PP on Feb. 5, 201
A Nonconvex Splitting Method for Symmetric Nonnegative Matrix Factorization: Convergence Analysis and Optimality
Symmetric nonnegative matrix factorization (SymNMF) has important
applications in data analytics problems such as document clustering, community
detection and image segmentation. In this paper, we propose a novel nonconvex
variable splitting method for solving SymNMF. The proposed algorithm is
guaranteed to converge to the set of Karush-Kuhn-Tucker (KKT) points of the
nonconvex SymNMF problem. Furthermore, it achieves a global sublinear
convergence rate. We also show that the algorithm can be efficiently
implemented in parallel. Further, sufficient conditions are provided which
guarantee the global and local optimality of the obtained solutions. Extensive
numerical results performed on both synthetic and real data sets suggest that
the proposed algorithm converges quickly to a local minimum solution.Comment: IEEE Transactions on Signal Processing (to appear
The Why and How of Nonnegative Matrix Factorization
Nonnegative matrix factorization (NMF) has become a widely used tool for the
analysis of high-dimensional data as it automatically extracts sparse and
meaningful features from a set of nonnegative data vectors. We first illustrate
this property of NMF on three applications, in image processing, text mining
and hyperspectral imaging --this is the why. Then we address the problem of
solving NMF, which is NP-hard in general. We review some standard NMF
algorithms, and also present a recent subclass of NMF problems, referred to as
near-separable NMF, that can be solved efficiently (that is, in polynomial
time), even in the presence of noise --this is the how. Finally, we briefly
describe some problems in mathematics and computer science closely related to
NMF via the nonnegative rank.Comment: 25 pages, 5 figures. Some typos and errors corrected, Section 3.2
reorganize
Microbial community pattern detection in human body habitats via ensemble clustering framework
The human habitat is a host where microbial species evolve, function, and
continue to evolve. Elucidating how microbial communities respond to human
habitats is a fundamental and critical task, as establishing baselines of human
microbiome is essential in understanding its role in human disease and health.
However, current studies usually overlook a complex and interconnected
landscape of human microbiome and limit the ability in particular body habitats
with learning models of specific criterion. Therefore, these methods could not
capture the real-world underlying microbial patterns effectively. To obtain a
comprehensive view, we propose a novel ensemble clustering framework to mine
the structure of microbial community pattern on large-scale metagenomic data.
Particularly, we first build a microbial similarity network via integrating
1920 metagenomic samples from three body habitats of healthy adults. Then a
novel symmetric Nonnegative Matrix Factorization (NMF) based ensemble model is
proposed and applied onto the network to detect clustering pattern. Extensive
experiments are conducted to evaluate the effectiveness of our model on
deriving microbial community with respect to body habitat and host gender. From
clustering results, we observed that body habitat exhibits a strong bound but
non-unique microbial structural patterns. Meanwhile, human microbiome reveals
different degree of structural variations over body habitat and host gender. In
summary, our ensemble clustering framework could efficiently explore integrated
clustering results to accurately identify microbial communities, and provide a
comprehensive view for a set of microbial communities. Such trends depict an
integrated biography of microbial communities, which offer a new insight
towards uncovering pathogenic model of human microbiome.Comment: BMC Systems Biology 201
Accelerated Parallel and Distributed Algorithm using Limited Internal Memory for Nonnegative Matrix Factorization
Nonnegative matrix factorization (NMF) is a powerful technique for dimension
reduction, extracting latent factors and learning part-based representation.
For large datasets, NMF performance depends on some major issues: fast
algorithms, fully parallel distributed feasibility and limited internal memory.
This research aims to design a fast fully parallel and distributed algorithm
using limited internal memory to reach high NMF performance for large datasets.
In particular, we propose a flexible accelerated algorithm for NMF with all its
regularized variants based on full decomposition, which is a
combination of an anti-lopsided algorithm and a fast block coordinate descent
algorithm. The proposed algorithm takes advantages of both these algorithms to
achieve a linear convergence rate of in
optimizing each factor matrix when fixing the other factor one in the sub-space
of passive variables, where is the number of latent components; where
. In addition, the algorithm can exploit the data
sparseness to run on large datasets with limited internal memory of machines.
Furthermore, our experimental results are highly competitive with 7
state-of-the-art methods about three significant aspects of convergence,
optimality and average of the iteration number. Therefore, the proposed
algorithm is superior to fast block coordinate descent methods and accelerated
methods
Introduction to Nonnegative Matrix Factorization
In this paper, we introduce and provide a short overview of nonnegative
matrix factorization (NMF). Several aspects of NMF are discussed, namely, the
application in hyperspectral imaging, geometry and uniqueness of NMF solutions,
complexity, algorithms, and its link with extended formulations of polyhedra.
In order to put NMF into perspective, the more general problem class of
constrained low-rank matrix approximation problems is first briefly introduced.Comment: 18 pages, 4 figure
- …