1,308 research outputs found
Small-variance asymptotics for Bayesian neural networks
Bayesian neural networks (BNNs) are a rich and flexible class of models that have several advantages over standard feedforward networks, but are typically expensive to train on large-scale data. In this thesis, we explore the use of small-variance asymptotics-an approach to yielding fast algorithms from probabilistic models-on various Bayesian neural network models. We first demonstrate how small-variance asymptotics shows precise connections between standard neural networks and BNNs; for example, particular sampling algorithms for BNNs reduce to standard backpropagation in the small-variance limit. We then explore a more complex BNN where the number of hidden units is additionally treated as a random variable in the model. While standard sampling schemes would be too slow to be practical, our asymptotic approach yields a simple method for extending standard backpropagation to the case where the number of hidden units is not fixed. We show on several data sets that the resulting algorithm has benefits over backpropagation on networks with a fixed architecture.2019-01-02T00:00:00
Analysis of Natural Gradient Descent for Multilayer Neural Networks
Natural gradient descent is a principled method for adapting the parameters
of a statistical model on-line using an underlying Riemannian parameter space
to redefine the direction of steepest descent. The algorithm is examined via
methods of statistical physics which accurately characterize both transient and
asymptotic behavior. A solution of the learning dynamics is obtained for the
case of multilayer neural network training in the limit of large input
dimension. We find that natural gradient learning leads to optimal asymptotic
performance and outperforms gradient descent in the transient, significantly
shortening or even removing plateaus in the transient generalization
performance which typically hamper gradient descent training.Comment: 14 pages including figures. To appear in Physical Review
The Information Complexity of Learning Tasks, their Structure and their Distance
We introduce an asymmetric distance in the space of learning tasks, and a
framework to compute their complexity. These concepts are foundational for the
practice of transfer learning, whereby a parametric model is pre-trained for a
task, and then fine-tuned for another. The framework we develop is
non-asymptotic, captures the finite nature of the training dataset, and allows
distinguishing learning from memorization. It encompasses, as special cases,
classical notions from Kolmogorov complexity, Shannon, and Fisher Information.
However, unlike some of those frameworks, it can be applied to large-scale
models and real-world datasets. Our framework is the first to measure
complexity in a way that accounts for the effect of the optimization scheme,
which is critical in Deep Learning
- …