Search CORE

4,196 research outputs found

Variational Dropout and the Local Reparameterization Trick

Author: Kingma Diederik P.
Salimans Tim
Welling Max
Publication venue
Publication date: 01/01/2015
Field of study

We investigate a local reparameterizaton technique for greatly reducing the variance of stochastic gradients for variational Bayesian inference (SGVB) of a posterior over model parameters, while retaining parallelizability. This local reparameterization translates uncertainty about global parameters into local noise that is independent across datapoints in the minibatch. Such parameterizations can be trivially parallelized and have variance that is inversely proportional to the minibatch size, generally leading to much faster convergence. Additionally, we explore a connection with dropout: Gaussian dropout objectives correspond to SGVB with local reparameterization, a scale-invariant prior and proportionally fixed posterior variance. Our method allows inference of more flexibly parameterized posteriors; specifically, we propose variational dropout, a generalization of Gaussian dropout where the dropout rates are learned, often leading to better models. The method is demonstrated through several experiments

arXiv.org e-Print Archive

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Reducing Reparameterization Gradient Variance

Author: Adams Ryan P.
D'Amour Alexander
Foti Nicholas J.
Miller Andrew C.
Publication venue
Publication date: 01/01/2017
Field of study

Optimization with noisy gradients has become ubiquitous in statistics and machine learning. Reparameterization gradients, or gradient estimates computed via the "reparameterization trick," represent a class of noisy gradients often used in Monte Carlo variational inference (MCVI). However, when these gradient estimators are too noisy, the optimization procedure can be slow or fail to converge. One way to reduce noise is to use more samples for the gradient estimate, but this can be computationally expensive. Instead, we view the noisy gradient as a random variable, and form an inexpensive approximation of the generating procedure for the gradient sample. This approximation has high correlation with the noisy gradient by construction, making it a useful control variate for variance reduction. We demonstrate our approach on non-conjugate multi-level hierarchical models and a Bayesian neural net where we observed gradient variance reductions of multiple orders of magnitude (20-2,000x)

arXiv.org e-Print Archive

Princeton University Open Access Repository

Tree-based reparameterization with distributional approximations for reduced-complexity MIMO symbol detection

Author: Andrieu C
Piechocki RJ
Soler Garrido J
Vithanage CM
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2008
Field of study

Explore Bristol Research