1,806 research outputs found
BayesAdapter: Being Bayesian, Inexpensively and Reliably, via Bayesian Fine-tuning
Despite their theoretical appealingness, Bayesian neural networks (BNNs) are
left behind in real-world adoption, due to persistent concerns on their
scalability, accessibility, and reliability. In this work, we aim to relieve
these concerns by developing the BayesAdapter framework for learning
variational BNNs. In particular, we propose to adapt the pre-trained
deterministic NNs to be BNNs via cost-effective Bayesian fine-tuning. To make
BayesAdapter more practical, we technically contribute 1) a modularized,
user-friendly implementation for the learning of variational BNNs under two
representative variational distributions, 2) a generally applicable strategy
for reducing the gradient variance in stochastic variational inference, 3) an
explanation for the unreliability issue of BNNs' uncertainty estimates, and a
corresponding prescription. Through extensive experiments on diverse
benchmarks, we show that BayesAdapter can consistently induce posteriors with
higher quality than the from-scratch variational inference and other
competitive baselines, especially in large-scale settings, yet significantly
reducing training overheads
Bayesian filtering unifies adaptive and non-adaptive neural network optimization methods
We formulate the problem of neural network optimization as Bayesian
filtering, where the observations are the backpropagated gradients. While
neural network optimization has previously been studied using natural gradient
methods which are closely related to Bayesian inference, they were unable to
recover standard optimizers such as Adam and RMSprop with a root-mean-square
gradient normalizer, instead getting a mean-square normalizer. To recover the
root-mean-square normalizer, we find it necessary to account for the temporal
dynamics of all the other parameters as they are geing optimized. The resulting
optimizer, AdaBayes, adaptively transitions between SGD-like and Adam-like
behaviour, automatically recovers AdamW, a state of the art variant of Adam
with decoupled weight decay, and has generalisation performance competitive
with SGD
- …