2,840 research outputs found
Integrating Document Clustering and Topic Modeling
Document clustering and topic modeling are two closely related tasks which
can mutually benefit each other. Topic modeling can project documents into a
topic space which facilitates effective document clustering. Cluster labels
discovered by document clustering can be incorporated into topic models to
extract local topics specific to each cluster and global topics shared by all
clusters. In this paper, we propose a multi-grain clustering topic model
(MGCTM) which integrates document clustering and topic modeling into a unified
framework and jointly performs the two tasks to achieve the overall best
performance. Our model tightly couples two components: a mixture component used
for discovering latent groups in document collection and a topic model
component used for mining multi-grain topics including local topics specific to
each cluster and global topics shared across clusters.We employ variational
inference to approximate the posterior of hidden variables and learn model
parameters. Experiments on two datasets demonstrate the effectiveness of our
model.Comment: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty
in Artificial Intelligence (UAI2013
A Nonconvex Splitting Method for Symmetric Nonnegative Matrix Factorization: Convergence Analysis and Optimality
Symmetric nonnegative matrix factorization (SymNMF) has important
applications in data analytics problems such as document clustering, community
detection and image segmentation. In this paper, we propose a novel nonconvex
variable splitting method for solving SymNMF. The proposed algorithm is
guaranteed to converge to the set of Karush-Kuhn-Tucker (KKT) points of the
nonconvex SymNMF problem. Furthermore, it achieves a global sublinear
convergence rate. We also show that the algorithm can be efficiently
implemented in parallel. Further, sufficient conditions are provided which
guarantee the global and local optimality of the obtained solutions. Extensive
numerical results performed on both synthetic and real data sets suggest that
the proposed algorithm converges quickly to a local minimum solution.Comment: IEEE Transactions on Signal Processing (to appear
- …