Search CORE

4,566 research outputs found

Document Clustering Based On Max-Correntropy Non-Negative Matrix Factorization

Author: Li Le
Qin Zhen
Xu Yang
Yang Jianjun
Zhang Honggang
Publication venue
Publication date: 03/10/2014
Field of study

Nonnegative matrix factorization (NMF) has been successfully applied to many areas for classification and clustering. Commonly-used NMF algorithms mainly target on minimizing the

l_2

distance or Kullback-Leibler (KL) divergence, which may not be suitable for nonlinear case. In this paper, we propose a new decomposition method by maximizing the correntropy between the original and the product of two low-rank matrices for document clustering. This method also allows us to learn the new basis vectors of the semantic feature space from the data. To our knowledge, we haven't seen any work has been done by maximizing correntropy in NMF to cluster high dimensional document data. Our experiment results show the supremacy of our proposed method over other variants of NMF algorithm on Reuters21578 and TDT2 databasets.Comment: International Conference of Machine Learning and Cybernetics (ICMLC) 201

arXiv.org e-Print Archive

CiteSeerX

Graph Regularized Non-negative Matrix Factorization By Maximizing Correntropy

Author: Fan Zhuoyi
Li Le
Xu Yang
Yang Jianjun
Zhang Honggang
Zhao Kaili
Publication venue
Publication date: 09/05/2014
Field of study

Non-negative matrix factorization (NMF) has proved effective in many clustering and classification tasks. The classic ways to measure the errors between the original and the reconstructed matrix are

l_2

distance or Kullback-Leibler (KL) divergence. However, nonlinear cases are not properly handled when we use these error measures. As a consequence, alternative measures based on nonlinear kernels, such as correntropy, are proposed. However, the current correntropy-based NMF only targets on the low-level features without considering the intrinsic geometrical distribution of data. In this paper, we propose a new NMF algorithm that preserves local invariance by adding graph regularization into the process of max-correntropy-based matrix factorization. Meanwhile, each feature can learn corresponding kernel from the data. The experiment results of Caltech101 and Caltech256 show the benefits of such combination against other NMF algorithms for the unsupervised image clustering

arXiv.org e-Print Archive

CiteSeerX

Four algorithms to solve symmetric multi-type non-negative matrix tri-factorization problem

Author: Hrga Timotej
Hribar Rok
Papa Gregor
Petelin Gašper
Povh Janez
Pržulj Nataša
Vukašinović Vida
Publication venue
Publication date: 10/12/2020
Field of study

In this paper, we consider the symmetric multi-type non-negative matrix tri-factorization problem (SNMTF), which attempts to factorize several symmetric non-negative matrices simultaneously. This can be considered as a generalization of the classical non-negative matrix tri-factorization problem and includes a non-convex objective function which is a multivariate sixth degree polynomial and a has convex feasibility set. It has a special importance in data science, since it serves as a mathematical model for the fusion of different data sources in data clustering. We develop four methods to solve the SNMTF. They are based on four theoretical approaches known from the literature: the fixed point method (FPM), the block-coordinate descent with projected gradient (BCD), the gradient method with exact line search (GM-ELS) and the adaptive moment estimation method (ADAM). For each of these methods we offer a software implementation: for the former two methods we use Matlab and for the latter Python with the TensorFlow library. We test these methods on three data-sets: the synthetic data-set we generated, while the others represent real-life similarities between different objects. Extensive numerical results show that with sufficient computing time all four methods perform satisfactorily and ADAM most often yields the best mean square error (

\mathrm{MSE}

). However, if the computation time is limited, FPM gives the best

\mathrm{MSE}

because it shows the fastest convergence at the beginning. All data-sets and codes are publicly available on our GitLab profile

arXiv.org e-Print Archive

Repository of the University of Ljubljana

An Oracle Inequality for Quasi-Bayesian Non-Negative Matrix Factorization

Author: Alquier Pierre
Guedj Benjamin
Publication venue: 'Allerton Press'
Publication date: 01/01/2017
Field of study

The aim of this paper is to provide some theoretical understanding of quasi-Bayesian aggregation methods non-negative matrix factorization. We derive an oracle inequality for an aggregated estimator. This result holds for a very general class of prior distributions and shows how the prior affects the rate of convergence.Comment: This is the corrected version of the published paper P. Alquier, B. Guedj, An Oracle Inequality for Quasi-Bayesian Non-negative Matrix Factorization, Mathematical Methods of Statistics, 2017, vol. 26, no. 1, pp. 55-67. Since then Arnak Dalalyan (ENSAE) found a mistake in the proofs. We fixed the mistake at the price of a slightly different logarithmic term in the boun

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server