4,603 research outputs found
A Topic Modeling Toolbox Using Belief Propagation
Latent Dirichlet allocation (LDA) is an important hierarchical Bayesian model
for probabilistic topic modeling, which attracts worldwide interests and
touches on many important applications in text mining, computer vision and
computational biology. This paper introduces a topic modeling toolbox (TMBP)
based on the belief propagation (BP) algorithms. TMBP toolbox is implemented by
MEX C++/Matlab/Octave for either Windows 7 or Linux. Compared with existing
topic modeling packages, the novelty of this toolbox lies in the BP algorithms
for learning LDA-based topic models. The current version includes BP algorithms
for latent Dirichlet allocation (LDA), author-topic models (ATM), relational
topic models (RTM), and labeled LDA (LaLDA). This toolbox is an ongoing project
and more BP-based algorithms for various topic models will be added in the near
future. Interested users may also extend BP algorithms for learning more
complicated topic models. The source codes are freely available under the GNU
General Public Licence, Version 1.0 at https://mloss.org/software/view/399/.Comment: 4 page
A Factor Graph Approach to Automated Design of Bayesian Signal Processing Algorithms
The benefits of automating design cycles for Bayesian inference-based
algorithms are becoming increasingly recognized by the machine learning
community. As a result, interest in probabilistic programming frameworks has
much increased over the past few years. This paper explores a specific
probabilistic programming paradigm, namely message passing in Forney-style
factor graphs (FFGs), in the context of automated design of efficient Bayesian
signal processing algorithms. To this end, we developed "ForneyLab"
(https://github.com/biaslab/ForneyLab.jl) as a Julia toolbox for message
passing-based inference in FFGs. We show by example how ForneyLab enables
automatic derivation of Bayesian signal processing algorithms, including
algorithms for parameter estimation and model comparison. Crucially, due to the
modular makeup of the FFG framework, both the model specification and inference
methods are readily extensible in ForneyLab. In order to test this framework,
we compared variational message passing as implemented by ForneyLab with
automatic differentiation variational inference (ADVI) and Monte Carlo methods
as implemented by state-of-the-art tools "Edward" and "Stan". In terms of
performance, extensibility and stability issues, ForneyLab appears to enjoy an
edge relative to its competitors for automated inference in state-space models.Comment: Accepted for publication in the International Journal of Approximate
Reasonin
Memory-Efficient Topic Modeling
As one of the simplest probabilistic topic modeling techniques, latent
Dirichlet allocation (LDA) has found many important applications in text
mining, computer vision and computational biology. Recent training algorithms
for LDA can be interpreted within a unified message passing framework. However,
message passing requires storing previous messages with a large amount of
memory space, increasing linearly with the number of documents or the number of
topics. Therefore, the high memory usage is often a major problem for topic
modeling of massive corpora containing a large number of topics. To reduce the
space complexity, we propose a novel algorithm without storing previous
messages for training LDA: tiny belief propagation (TBP). The basic idea of TBP
relates the message passing algorithms with the non-negative matrix
factorization (NMF) algorithms, which absorb the message updating into the
message passing process, and thus avoid storing previous messages. Experimental
results on four large data sets confirm that TBP performs comparably well or
even better than current state-of-the-art training algorithms for LDA but with
a much less memory consumption. TBP can do topic modeling when massive corpora
cannot fit in the computer memory, for example, extracting thematic topics from
7 GB PUBMED corpora on a common desktop computer with 2GB memory.Comment: 20 pages, 7 figure
A New Approach to Speeding Up Topic Modeling
Latent Dirichlet allocation (LDA) is a widely-used probabilistic topic
modeling paradigm, and recently finds many applications in computer vision and
computational biology. In this paper, we propose a fast and accurate batch
algorithm, active belief propagation (ABP), for training LDA. Usually batch LDA
algorithms require repeated scanning of the entire corpus and searching the
complete topic space. To process massive corpora having a large number of
topics, the training iteration of batch LDA algorithms is often inefficient and
time-consuming. To accelerate the training speed, ABP actively scans the subset
of corpus and searches the subset of topic space for topic modeling, therefore
saves enormous training time in each iteration. To ensure accuracy, ABP selects
only those documents and topics that contribute to the largest residuals within
the residual belief propagation (RBP) framework. On four real-world corpora,
ABP performs around to times faster than state-of-the-art batch LDA
algorithms with a comparable topic modeling accuracy.Comment: 14 pages, 12 figure
Learning Topic Models by Belief Propagation
Latent Dirichlet allocation (LDA) is an important hierarchical Bayesian model
for probabilistic topic modeling, which attracts worldwide interests and
touches on many important applications in text mining, computer vision and
computational biology. This paper represents LDA as a factor graph within the
Markov random field (MRF) framework, which enables the classic loopy belief
propagation (BP) algorithm for approximate inference and parameter estimation.
Although two commonly-used approximate inference methods, such as variational
Bayes (VB) and collapsed Gibbs sampling (GS), have gained great successes in
learning LDA, the proposed BP is competitive in both speed and accuracy as
validated by encouraging experimental results on four large-scale document data
sets. Furthermore, the BP algorithm has the potential to become a generic
learning scheme for variants of LDA-based topic models. To this end, we show
how to learn two typical variants of LDA-based topic models, such as
author-topic models (ATM) and relational topic models (RTM), using BP based on
the factor graph representation.Comment: 14 pages, 17 figure
- …