1,806 research outputs found

    BayesAdapter: Being Bayesian, Inexpensively and Reliably, via Bayesian Fine-tuning

    Full text link
    Despite their theoretical appealingness, Bayesian neural networks (BNNs) are left behind in real-world adoption, due to persistent concerns on their scalability, accessibility, and reliability. In this work, we aim to relieve these concerns by developing the BayesAdapter framework for learning variational BNNs. In particular, we propose to adapt the pre-trained deterministic NNs to be BNNs via cost-effective Bayesian fine-tuning. To make BayesAdapter more practical, we technically contribute 1) a modularized, user-friendly implementation for the learning of variational BNNs under two representative variational distributions, 2) a generally applicable strategy for reducing the gradient variance in stochastic variational inference, 3) an explanation for the unreliability issue of BNNs' uncertainty estimates, and a corresponding prescription. Through extensive experiments on diverse benchmarks, we show that BayesAdapter can consistently induce posteriors with higher quality than the from-scratch variational inference and other competitive baselines, especially in large-scale settings, yet significantly reducing training overheads

    Bayesian filtering unifies adaptive and non-adaptive neural network optimization methods

    Full text link
    We formulate the problem of neural network optimization as Bayesian filtering, where the observations are the backpropagated gradients. While neural network optimization has previously been studied using natural gradient methods which are closely related to Bayesian inference, they were unable to recover standard optimizers such as Adam and RMSprop with a root-mean-square gradient normalizer, instead getting a mean-square normalizer. To recover the root-mean-square normalizer, we find it necessary to account for the temporal dynamics of all the other parameters as they are geing optimized. The resulting optimizer, AdaBayes, adaptively transitions between SGD-like and Adam-like behaviour, automatically recovers AdamW, a state of the art variant of Adam with decoupled weight decay, and has generalisation performance competitive with SGD
    • …
    corecore