35,355 research outputs found
High-Order Stochastic Gradient Thermostats for Bayesian Learning of Deep Models
Learning in deep models using Bayesian methods has generated significant
attention recently. This is largely because of the feasibility of modern
Bayesian methods to yield scalable learning and inference, while maintaining a
measure of uncertainty in the model parameters. Stochastic gradient MCMC
algorithms (SG-MCMC) are a family of diffusion-based sampling methods for
large-scale Bayesian learning. In SG-MCMC, multivariate stochastic gradient
thermostats (mSGNHT) augment each parameter of interest, with a momentum and a
thermostat variable to maintain stationary distributions as target posterior
distributions. As the number of variables in a continuous-time diffusion
increases, its numerical approximation error becomes a practical bottleneck, so
better use of a numerical integrator is desirable. To this end, we propose use
of an efficient symmetric splitting integrator in mSGNHT, instead of the
traditional Euler integrator. We demonstrate that the proposed scheme is more
accurate, robust, and converges faster. These properties are demonstrated to be
desirable in Bayesian deep learning. Extensive experiments on two canonical
models and their deep extensions demonstrate that the proposed scheme improves
general Bayesian posterior sampling, particularly for deep models.Comment: AAAI 201
Bayesian autoencoders for data-driven discovery of coordinates, governing equations and fundamental constants
Recent progress in autoencoder-based sparse identification of nonlinear
dynamics (SINDy) under constraints allows joint discoveries of
governing equations and latent coordinate systems from spatio-temporal data,
including simulated video frames. However, it is challenging for -based
sparse inference to perform correct identification for real data due to the
noisy measurements and often limited sample sizes. To address the data-driven
discovery of physics in the low-data and high-noise regimes, we propose
Bayesian SINDy autoencoders, which incorporate a hierarchical Bayesian
sparsifying prior: Spike-and-slab Gaussian Lasso. Bayesian SINDy autoencoder
enables the joint discovery of governing equations and coordinate systems with
a theoretically guaranteed uncertainty estimate. To resolve the challenging
computational tractability of the Bayesian hierarchical setting, we adapt an
adaptive empirical Bayesian method with Stochatic gradient Langevin dynamics
(SGLD) which gives a computationally tractable way of Bayesian posterior
sampling within our framework. Bayesian SINDy autoencoder achieves better
physics discovery with lower data and fewer training epochs, along with valid
uncertainty quantification suggested by the experimental studies. The Bayesian
SINDy autoencoder can be applied to real video data, with accurate physics
discovery which correctly identifies the governing equation and provides a
close estimate for standard physics constants like gravity , for example, in
videos of a pendulum.Comment: 28 pages, 11 figure
Variational Dropout and the Local Reparameterization Trick
We investigate a local reparameterizaton technique for greatly reducing the
variance of stochastic gradients for variational Bayesian inference (SGVB) of a
posterior over model parameters, while retaining parallelizability. This local
reparameterization translates uncertainty about global parameters into local
noise that is independent across datapoints in the minibatch. Such
parameterizations can be trivially parallelized and have variance that is
inversely proportional to the minibatch size, generally leading to much faster
convergence. Additionally, we explore a connection with dropout: Gaussian
dropout objectives correspond to SGVB with local reparameterization, a
scale-invariant prior and proportionally fixed posterior variance. Our method
allows inference of more flexibly parameterized posteriors; specifically, we
propose variational dropout, a generalization of Gaussian dropout where the
dropout rates are learned, often leading to better models. The method is
demonstrated through several experiments
- β¦