5,389 research outputs found
A Theoretically Grounded Application of Dropout in Recurrent Neural Networks
Recurrent neural networks (RNNs) stand at the forefront of many recent
developments in deep learning. Yet a major difficulty with these models is
their tendency to overfit, with dropout shown to fail when applied to recurrent
layers. Recent results at the intersection of Bayesian modelling and deep
learning offer a Bayesian interpretation of common deep learning techniques
such as dropout. This grounding of dropout in approximate Bayesian inference
suggests an extension of the theoretical results, offering insights into the
use of dropout with RNN models. We apply this new variational inference based
dropout technique in LSTM and GRU models, assessing it on language modelling
and sentiment analysis tasks. The new approach outperforms existing techniques,
and to the best of our knowledge improves on the single model state-of-the-art
in language modelling with the Penn Treebank (73.4 test perplexity). This
extends our arsenal of variational tools in deep learning.Comment: Added clarifications; Published in NIPS 201
Bayesian Dropout
Dropout has recently emerged as a powerful and simple method for training
neural networks preventing co-adaptation by stochastically omitting neurons.
Dropout is currently not grounded in explicit modelling assumptions which so
far has precluded its adoption in Bayesian modelling. Using Bayesian entropic
reasoning we show that dropout can be interpreted as optimal inference under
constraints. We demonstrate this on an analytically tractable regression model
providing a Bayesian interpretation of its mechanism for regularizing and
preventing co-adaptation as well as its connection to other Bayesian
techniques. We also discuss two general approximate techniques for applying
Bayesian dropout for general models, one based on an analytical approximation
and the other on stochastic variational techniques. These techniques are then
applied to a Baysian logistic regression problem and are shown to improve
performance as the model become more misspecified. Our framework roots dropout
as a theoretically justified and practical tool for statistical modelling
allowing Bayesians to tap into the benefits of dropout training.Comment: 21 pages, 3 figures. Manuscript prepared 2014 and awaiting submissio
Variational Dropout and the Local Reparameterization Trick
We investigate a local reparameterizaton technique for greatly reducing the
variance of stochastic gradients for variational Bayesian inference (SGVB) of a
posterior over model parameters, while retaining parallelizability. This local
reparameterization translates uncertainty about global parameters into local
noise that is independent across datapoints in the minibatch. Such
parameterizations can be trivially parallelized and have variance that is
inversely proportional to the minibatch size, generally leading to much faster
convergence. Additionally, we explore a connection with dropout: Gaussian
dropout objectives correspond to SGVB with local reparameterization, a
scale-invariant prior and proportionally fixed posterior variance. Our method
allows inference of more flexibly parameterized posteriors; specifically, we
propose variational dropout, a generalization of Gaussian dropout where the
dropout rates are learned, often leading to better models. The method is
demonstrated through several experiments
- …