43,899 research outputs found
Fast Conical Hull Algorithms for Near-separable Non-negative Matrix Factorization
The separability assumption (Donoho & Stodden, 2003; Arora et al., 2012)
turns non-negative matrix factorization (NMF) into a tractable problem.
Recently, a new class of provably-correct NMF algorithms have emerged under
this assumption. In this paper, we reformulate the separable NMF problem as
that of finding the extreme rays of the conical hull of a finite set of
vectors. From this geometric perspective, we derive new separable NMF
algorithms that are highly scalable and empirically noise robust, and have
several other favorable properties in relation to existing methods. A parallel
implementation of our algorithm demonstrates high scalability on shared- and
distributed-memory machines.Comment: 15 pages, 6 figure
Primal-Dual Algorithms for Non-negative Matrix Factorization with the Kullback-Leibler Divergence
Non-negative matrix factorization (NMF) approximates a given matrix as a
product of two non-negative matrices. Multiplicative algorithms deliver
reliable results, but they show slow convergence for high-dimensional data and
may be stuck away from local minima. Gradient descent methods have better
behavior, but only apply to smooth losses such as the least-squares loss. In
this article, we propose a first-order primal-dual algorithm for non-negative
decomposition problems (where one factor is fixed) with the KL divergence,
based on the Chambolle-Pock algorithm. All required computations may be
obtained in closed form and we provide an efficient heuristic way to select
step-sizes. By using alternating optimization, our algorithm readily extends to
NMF and, on synthetic examples, face recognition or music source separation
datasets, it is either faster than existing algorithms, or leads to improved
local optima, or both
Evolutionary star-structured heterogeneous data co-clustering
A star-structured interrelationship, which is a more common type in real world data, has a central object connected to the other types of objects. One of the key challenges in evolutionary clustering is integration of historical data in current data. Traditionally, smoothness in data transition over a period of time is achieved by means of cost functions defined over historical and current data. These functions provide a tunable tolerance for shifts of current data accounting instance to all historical information for corresponding instance. Once historical data is integrated into current data using cost functions, co-clustering is obtained using various co-clustering algorithms like spectral clustering, non-negative matrix factorization, and information theory based clustering. Non-negative matrix factorization has been proven efficient and scalable for large data and is less memory intensive compared to other approaches. Non-negative matrix factorization tri-factorizes original data matrix into row indicator matrix, column indicator matrix, and a matrix that provides correlation between the row and column clusters. However, challenges in clustering evolving heterogeneous data have never been addressed. In this thesis, I propose a new algorithm for clustering a specific case of this problem, viz. the star-structured heterogeneous data. The proposed algorithm will provide cost functions to integrate historical star-structured heterogeneous data into current data. Then I will use non-negative matrix factorization to cluster each time-step of instances and features. This contribution to the field will provide an avenue for further development of higher order evolutionary co-clustering algorithms
The Diagonalized Newton Algorithm for Nonnegative Matrix Factorization
Non-negative matrix factorization (NMF) has become a popular machine learning
approach to many problems in text mining, speech and image processing,
bio-informatics and seismic data analysis to name a few. In NMF, a matrix of
non-negative data is approximated by the low-rank product of two matrices with
non-negative entries. In this paper, the approximation quality is measured by
the Kullback-Leibler divergence between the data and its low-rank
reconstruction. The existence of the simple multiplicative update (MU)
algorithm for computing the matrix factors has contributed to the success of
NMF. Despite the availability of algorithms showing faster convergence, MU
remains popular due to its simplicity. In this paper, a diagonalized Newton
algorithm (DNA) is proposed showing faster convergence while the implementation
remains simple and suitable for high-rank problems. The DNA algorithm is
applied to various publicly available data sets, showing a substantial speed-up
on modern hardware.Comment: 8 pages + references; International Conference on Learning
Representations, 201
New SVD based initialization strategy for Non-negative Matrix Factorization
There are two problems need to be dealt with for Non-negative Matrix
Factorization (NMF): choose a suitable rank of the factorization and provide a
good initialization method for NMF algorithms. This paper aims to solve these
two problems using Singular Value Decomposition (SVD). At first we extract the
number of main components as the rank, actually this method is inspired from
[1, 2]. Second, we use the singular value and its vectors to initialize NMF
algorithm. In 2008, Boutsidis and Gollopoulos [3] provided the method titled
NNDSVD to enhance initialization of NMF algorithms. They extracted the positive
section and respective singular triplet information of the unit matrices
{C(j)}k j=1 which were obtained from singular vector pairs. This strategy aims
to use positive section to cope with negative elements of the singular vectors,
but in experiments we found that even replacing negative elements by their
absolute values could get better results than NNDSVD. Hence, we give another
method based SVD to fulfil initialization for NMF algorithms (SVD-NMF).
Numerical experiments on two face databases ORL and YALE [16, 17] show that our
method is better than NNDSVD
Graph Regularized Non-negative Matrix Factorization By Maximizing Correntropy
Non-negative matrix factorization (NMF) has proved effective in many
clustering and classification tasks. The classic ways to measure the errors
between the original and the reconstructed matrix are distance or
Kullback-Leibler (KL) divergence. However, nonlinear cases are not properly
handled when we use these error measures. As a consequence, alternative
measures based on nonlinear kernels, such as correntropy, are proposed.
However, the current correntropy-based NMF only targets on the low-level
features without considering the intrinsic geometrical distribution of data. In
this paper, we propose a new NMF algorithm that preserves local invariance by
adding graph regularization into the process of max-correntropy-based matrix
factorization. Meanwhile, each feature can learn corresponding kernel from the
data. The experiment results of Caltech101 and Caltech256 show the benefits of
such combination against other NMF algorithms for the unsupervised image
clustering
- …