817,899 research outputs found
Learning Topic Models and Latent Bayesian Networks Under Expansion Constraints
Unsupervised estimation of latent variable models is a fundamental problem
central to numerous applications of machine learning and statistics. This work
presents a principled approach for estimating broad classes of such models,
including probabilistic topic models and latent linear Bayesian networks, using
only second-order observed moments. The sufficient conditions for
identifiability of these models are primarily based on weak expansion
constraints on the topic-word matrix, for topic models, and on the directed
acyclic graph, for Bayesian networks. Because no assumptions are made on the
distribution among the latent variables, the approach can handle arbitrary
correlations among the topics or latent factors. In addition, a tractable
learning method via optimization is proposed and studied in numerical
experiments.Comment: 38 pages, 6 figures, 2 tables, applications in topic models and
Bayesian networks are studied. Simulation section is adde
A Topic Modeling Toolbox Using Belief Propagation
Latent Dirichlet allocation (LDA) is an important hierarchical Bayesian model
for probabilistic topic modeling, which attracts worldwide interests and
touches on many important applications in text mining, computer vision and
computational biology. This paper introduces a topic modeling toolbox (TMBP)
based on the belief propagation (BP) algorithms. TMBP toolbox is implemented by
MEX C++/Matlab/Octave for either Windows 7 or Linux. Compared with existing
topic modeling packages, the novelty of this toolbox lies in the BP algorithms
for learning LDA-based topic models. The current version includes BP algorithms
for latent Dirichlet allocation (LDA), author-topic models (ATM), relational
topic models (RTM), and labeled LDA (LaLDA). This toolbox is an ongoing project
and more BP-based algorithms for various topic models will be added in the near
future. Interested users may also extend BP algorithms for learning more
complicated topic models. The source codes are freely available under the GNU
General Public Licence, Version 1.0 at https://mloss.org/software/view/399/.Comment: 4 page
Learning Topic Models - Going beyond SVD
Topic Modeling is an approach used for automatic comprehension and
classification of data in a variety of settings, and perhaps the canonical
application is in uncovering thematic structure in a corpus of documents. A
number of foundational works both in machine learning and in theory have
suggested a probabilistic model for documents, whereby documents arise as a
convex combination of (i.e. distribution on) a small number of topic vectors,
each topic vector being a distribution on words (i.e. a vector of
word-frequencies). Similar models have since been used in a variety of
application areas; the Latent Dirichlet Allocation or LDA model of Blei et al.
is especially popular.
Theoretical studies of topic modeling focus on learning the model's
parameters assuming the data is actually generated from it. Existing approaches
for the most part rely on Singular Value Decomposition(SVD), and consequently
have one of two limitations: these works need to either assume that each
document contains only one topic, or else can only recover the span of the
topic vectors instead of the topic vectors themselves.
This paper formally justifies Nonnegative Matrix Factorization(NMF) as a main
tool in this context, which is an analog of SVD where all vectors are
nonnegative. Using this tool we give the first polynomial-time algorithm for
learning topic models without the above two limitations. The algorithm uses a
fairly mild assumption about the underlying topic matrix called separability,
which is usually found to hold in real-life data. A compelling feature of our
algorithm is that it generalizes to models that incorporate topic-topic
correlations, such as the Correlated Topic Model and the Pachinko Allocation
Model.
We hope that this paper will motivate further theoretical results that use
NMF as a replacement for SVD - just as NMF has come to replace SVD in many
applications
Multidimensional Membership Mixture Models
We present the multidimensional membership mixture (M3) models where every
dimension of the membership represents an independent mixture model and each
data point is generated from the selected mixture components jointly. This is
helpful when the data has a certain shared structure. For example, three unique
means and three unique variances can effectively form a Gaussian mixture model
with nine components, while requiring only six parameters to fully describe it.
In this paper, we present three instantiations of M3 models (together with the
learning and inference algorithms): infinite, finite, and hybrid, depending on
whether the number of mixtures is fixed or not. They are built upon Dirichlet
process mixture models, latent Dirichlet allocation, and a combination
respectively. We then consider two applications: topic modeling and learning 3D
object arrangements. Our experiments show that our M3 models achieve better
performance using fewer topics than many classic topic models. We also observe
that topics from the different dimensions of M3 models are meaningful and
orthogonal to each other.Comment: 9 pages, 7 figure
Towards the TopMost: A Topic Modeling System Toolkit
Topic models have been proposed for decades with various applications and
recently refreshed by the neural variational inference. However, these topic
models adopt totally distinct dataset, implementation, and evaluation settings,
which hinders their quick utilization and fair comparisons. This greatly
hinders the research progress of topic models. To address these issues, in this
paper we propose a Topic Modeling System Toolkit (TopMost). Compared to
existing toolkits, TopMost stands out by covering a wider range of topic
modeling scenarios including complete lifecycles with dataset pre-processing,
model training, testing, and evaluations. The highly cohesive and decoupled
modular design of TopMost enables quick utilization, fair comparisons, and
flexible extensions of different topic models. This can facilitate the research
and applications of topic models. Our code, tutorials, and documentation are
available at https://github.com/bobxwu/topmost
- β¦