6,599 research outputs found
Big Learning with Bayesian Methods
Explosive growth in data and availability of cheap computing resources have
sparked increasing interest in Big learning, an emerging subfield that studies
scalable machine learning algorithms, systems, and applications with Big Data.
Bayesian methods represent one important class of statistic methods for machine
learning, with substantial recent developments on adaptive, flexible and
scalable Bayesian learning. This article provides a survey of the recent
advances in Big learning with Bayesian methods, termed Big Bayesian Learning,
including nonparametric Bayesian methods for adaptively inferring model
complexity, regularized Bayesian inference for improving the flexibility via
posterior regularization, and scalable algorithms and systems based on
stochastic subsampling and distributed computing for dealing with large-scale
applications.Comment: 21 pages, 6 figure
Generalizing Hamiltonian Monte Carlo with Neural Networks
We present a general-purpose method to train Markov chain Monte Carlo
kernels, parameterized by deep neural networks, that converge and mix quickly
to their target distribution. Our method generalizes Hamiltonian Monte Carlo
and is trained to maximize expected squared jumped distance, a proxy for mixing
speed. We demonstrate large empirical gains on a collection of simple but
challenging distributions, for instance achieving a 106x improvement in
effective sample size in one case, and mixing when standard HMC makes no
measurable progress in a second. Finally, we show quantitative and qualitative
gains on a real-world task: latent-variable generative modeling. We release an
open source TensorFlow implementation of the algorithm.Comment: ICLR 201
NeuTra-lizing Bad Geometry in Hamiltonian Monte Carlo Using Neural Transport
Hamiltonian Monte Carlo is a powerful algorithm for sampling from
difficult-to-normalize posterior distributions. However, when the geometry of
the posterior is unfavorable, it may take many expensive evaluations of the
target distribution and its gradient to converge and mix. We propose neural
transport (NeuTra) HMC, a technique for learning to correct this sort of
unfavorable geometry using inverse autoregressive flows (IAF), a powerful
neural variational inference technique. The IAF is trained to minimize the KL
divergence from an isotropic Gaussian to the warped posterior, and then HMC
sampling is performed in the warped space. We evaluate NeuTra HMC on a variety
of synthetic and real problems, and find that it significantly outperforms
vanilla HMC both in time to reach the stationary distribution and asymptotic
effective-sample-size rates
Bounding the Test Log-Likelihood of Generative Models
Several interesting generative learning algorithms involve a complex
probability distribution over many random variables, involving intractable
normalization constants or latent variable normalization. Some of them may even
not have an analytic expression for the unnormalized probability function and
no tractable approximation. This makes it difficult to estimate the quality of
these models, once they have been trained, or to monitor their quality (e.g.
for early stopping) while training. A previously proposed method is based on
constructing a non-parametric density estimator of the model's probability
function from samples generated by the model. We revisit this idea, propose a
more efficient estimator, and prove that it provides a lower bound on the true
test log-likelihood, and an unbiased estimator as the number of generated
samples goes to infinity, although one that incorporates the effect of poor
mixing. We further propose a biased variant of the estimator that can be used
reliably with a finite number of samples for the purpose of model comparison.Comment: 10 pages, 1 figure, 2 tables. International Conference on Learning
Representations (ICLR'2014, conference track
A Contrastive Divergence for Combining Variational Inference and MCMC
We develop a method to combine Markov chain Monte Carlo (MCMC) and
variational inference (VI), leveraging the advantages of both inference
approaches. Specifically, we improve the variational distribution by running a
few MCMC steps. To make inference tractable, we introduce the variational
contrastive divergence (VCD), a new divergence that replaces the standard
Kullback-Leibler (KL) divergence used in VI. The VCD captures a notion of
discrepancy between the initial variational distribution and its improved
version (obtained after running the MCMC steps), and it converges
asymptotically to the symmetrized KL divergence between the variational
distribution and the posterior of interest. The VCD objective can be optimized
efficiently with respect to the variational parameters via stochastic
optimization. We show experimentally that optimizing the VCD leads to better
predictive performance on two latent variable models: logistic matrix
factorization and variational autoencoders (VAEs).Comment: International Conference on Machine Learning (ICML 2019). 12 pages, 3
figure
ZhuSuan: A Library for Bayesian Deep Learning
In this paper we introduce ZhuSuan, a python probabilistic programming
library for Bayesian deep learning, which conjoins the complimentary advantages
of Bayesian methods and deep learning. ZhuSuan is built upon Tensorflow. Unlike
existing deep learning libraries, which are mainly designed for deterministic
neural networks and supervised tasks, ZhuSuan is featured for its deep root
into Bayesian inference, thus supporting various kinds of probabilistic models,
including both the traditional hierarchical Bayesian models and recent deep
generative models. We use running examples to illustrate the probabilistic
programming on ZhuSuan, including Bayesian logistic regression, variational
auto-encoders, deep sigmoid belief networks and Bayesian recurrent neural
networks.Comment: The GitHub page is at https://github.com/thu-ml/zhusua
Approximate Inference with Amortised MCMC
We propose a novel approximate inference algorithm that approximates a target
distribution by amortising the dynamics of a user-selected MCMC sampler. The
idea is to initialise MCMC using samples from an approximation network, apply
the MCMC operator to improve these samples, and finally use the samples to
update the approximation network thereby improving its quality. This provides a
new generic framework for approximate inference, allowing us to deploy highly
complex, or implicitly defined approximation families with intractable
densities, including approximations produced by warping a source of randomness
through a deep neural network. Experiments consider image modelling with deep
generative models as a challenging test for the method. Deep models trained
using amortised MCMC are shown to generate realistic looking samples as well as
producing diverse imputations for images with regions of missing pixels
Learning Model Reparametrizations: Implicit Variational Inference by Fitting MCMC distributions
We introduce a new algorithm for approximate inference that combines
reparametrization, Markov chain Monte Carlo and variational methods. We
construct a very flexible implicit variational distribution synthesized by an
arbitrary Markov chain Monte Carlo operation and a deterministic transformation
that can be optimized using the reparametrization trick. Unlike current methods
for implicit variational inference, our method avoids the computation of log
density ratios and therefore it is easily applicable to arbitrary continuous
and differentiable models. We demonstrate the proposed algorithm for fitting
banana-shaped distributions and for training variational autoencoders.Comment: 16 pages, 6 figure
Denoising Adversarial Autoencoders
Unsupervised learning is of growing interest because it unlocks the potential
held in vast amounts of unlabelled data to learn useful representations for
inference. Autoencoders, a form of generative model, may be trained by learning
to reconstruct unlabelled input data from a latent representation space. More
robust representations may be produced by an autoencoder if it learns to
recover clean input samples from corrupted ones. Representations may be further
improved by introducing regularisation during training to shape the
distribution of the encoded data in latent space. We suggest denoising
adversarial autoencoders, which combine denoising and regularisation, shaping
the distribution of latent space using adversarial training. We introduce a
novel analysis that shows how denoising may be incorporated into the training
and sampling of adversarial autoencoders. Experiments are performed to assess
the contributions that denoising makes to the learning of representations for
classification and sample synthesis. Our results suggest that autoencoders
trained using a denoising criterion achieve higher classification performance,
and can synthesise samples that are more consistent with the input data than
those trained without a corruption process.Comment: submitted to journa
Advances in Variational Inference
Many modern unsupervised or semi-supervised machine learning algorithms rely
on Bayesian probabilistic models. These models are usually intractable and thus
require approximate inference. Variational inference (VI) lets us approximate a
high-dimensional Bayesian posterior with a simpler variational distribution by
solving an optimization problem. This approach has been successfully used in
various models and large-scale applications. In this review, we give an
overview of recent trends in variational inference. We first introduce standard
mean field variational inference, then review recent advances focusing on the
following aspects: (a) scalable VI, which includes stochastic approximations,
(b) generic VI, which extends the applicability of VI to a large class of
otherwise intractable models, such as non-conjugate models, (c) accurate VI,
which includes variational models beyond the mean field approximation or with
atypical divergences, and (d) amortized VI, which implements the inference over
local latent variables with inference networks. Finally, we provide a summary
of promising future research directions
- …