68 research outputs found
High-Order Stochastic Gradient Thermostats for Bayesian Learning of Deep Models
Learning in deep models using Bayesian methods has generated significant
attention recently. This is largely because of the feasibility of modern
Bayesian methods to yield scalable learning and inference, while maintaining a
measure of uncertainty in the model parameters. Stochastic gradient MCMC
algorithms (SG-MCMC) are a family of diffusion-based sampling methods for
large-scale Bayesian learning. In SG-MCMC, multivariate stochastic gradient
thermostats (mSGNHT) augment each parameter of interest, with a momentum and a
thermostat variable to maintain stationary distributions as target posterior
distributions. As the number of variables in a continuous-time diffusion
increases, its numerical approximation error becomes a practical bottleneck, so
better use of a numerical integrator is desirable. To this end, we propose use
of an efficient symmetric splitting integrator in mSGNHT, instead of the
traditional Euler integrator. We demonstrate that the proposed scheme is more
accurate, robust, and converges faster. These properties are demonstrated to be
desirable in Bayesian deep learning. Extensive experiments on two canonical
models and their deep extensions demonstrate that the proposed scheme improves
general Bayesian posterior sampling, particularly for deep models.Comment: AAAI 201
Scaling up Dynamic Edge Partition Models via Stochastic Gradient MCMC
The edge partition model (EPM) is a generative model for extracting an
overlapping community structure from static graph-structured data. In the EPM,
the gamma process (GaP) prior is adopted to infer the appropriate number of
latent communities, and each vertex is endowed with a gamma distributed
positive memberships vector. Despite having many attractive properties,
inference in the EPM is typically performed using Markov chain Monte Carlo
(MCMC) methods that prevent it from being applied to massive network data. In
this paper, we generalize the EPM to account for dynamic enviroment by
representing each vertex with a positive memberships vector constructed using
Dirichlet prior specification, and capturing the time-evolving behaviour of
vertices via a Dirichlet Markov chain construction. A simple-to-implement Gibbs
sampler is proposed to perform posterior computation using Negative- Binomial
augmentation technique. For large network data, we propose a stochastic
gradient Markov chain Monte Carlo (SG-MCMC) algorithm for scalable inference in
the proposed model. The experimental results show that the novel methods
achieve competitive performance in terms of link prediction, while being much
faster
Dirichlet belief networks for topic structure learning
Recently, considerable research effort has been devoted to developing deep
architectures for topic models to learn topic structures. Although several deep
models have been proposed to learn better topic proportions of documents, how
to leverage the benefits of deep structures for learning word distributions of
topics has not yet been rigorously studied. Here we propose a new multi-layer
generative process on word distributions of topics, where each layer consists
of a set of topics and each topic is drawn from a mixture of the topics of the
layer above. As the topics in all layers can be directly interpreted by words,
the proposed model is able to discover interpretable topic hierarchies. As a
self-contained module, our model can be flexibly adapted to different kinds of
topic models to improve their modelling accuracy and interpretability.
Extensive experiments on text corpora demonstrate the advantages of the
proposed model.Comment: accepted in NIPS 201
- …