2,485 research outputs found
Practical Gauss-Newton Optimisation for Deep Learning
We present an efficient block-diagonal ap- proximation to the Gauss-Newton
matrix for feedforward neural networks. Our result- ing algorithm is
competitive against state- of-the-art first order optimisation methods, with
sometimes significant improvement in optimisation performance. Unlike
first-order methods, for which hyperparameter tuning of the optimisation
parameters is often a labo- rious process, our approach can provide good
performance even when used with default set- tings. A side result of our work
is that for piecewise linear transfer functions, the net- work objective
function can have no differ- entiable local maxima, which may partially explain
why such transfer functions facilitate effective optimisation.Comment: ICML 201
Fixed-Form Variational Posterior Approximation through Stochastic Linear Regression
We propose a general algorithm for approximating nonstandard Bayesian
posterior distributions. The algorithm minimizes the Kullback-Leibler
divergence of an approximating distribution to the intractable posterior
distribution. Our method can be used to approximate any posterior distribution,
provided that it is given in closed form up to the proportionality constant.
The approximation can be any distribution in the exponential family or any
mixture of such distributions, which means that it can be made arbitrarily
precise. Several examples illustrate the speed and accuracy of our
approximation method in practice
Importance Sampled Stochastic Optimization for Variational Inference
Variational inference approximates the posterior distribution of a probabilistic model with a parameterized density by maximizing a lower bound for the model evidence. Modern solutions fit a flexible approximation with stochastic gradient descent, using Monte Carlo approximation for the gradients. This enables variational inference for arbitrary differentiable probabilistic models, and consequently makes variational inference feasible for probabilistic programming languages. In this work we develop more efficient inference algorithms for the task by considering importance sampling estimates for the gradients. We show how the gradient with respect to the approximation parameters can often be evaluated efficiently without needing to re-compute gradients of the model itself, and then proceed to derive practical algorithms that use importance sampled estimates to speed up computation. We present importance sampled stochastic gradient descent that outperforms standard stochastic gradient descent by a clear margin for a range of models, and provide a justifiable variant of stochastic average gradients for variational inference.Peer reviewe
Divide and conquer in ABC: Expectation-Progagation algorithms for likelihood-free inference
ABC algorithms are notoriously expensive in computing time, as they require
simulating many complete artificial datasets from the model. We advocate in
this paper a "divide and conquer" approach to ABC, where we split the
likelihood into n factors, and combine in some way n "local" ABC approximations
of each factor. This has two advantages: (a) such an approach is typically much
faster than standard ABC and (b) it makes it possible to use local summary
statistics (i.e. summary statistics that depend only on the data-points that
correspond to a single factor), rather than global summary statistics (that
depend on the complete dataset). This greatly alleviates the bias introduced by
summary statistics, and even removes it entirely in situations where local
summary statistics are simply the identity function.
We focus on EP (Expectation-Propagation), a convenient and powerful way to
combine n local approximations into a global approximation. Compared to the EP-
ABC approach of Barthelm\'e and Chopin (2014), we present two variations, one
based on the parallel EP algorithm of Cseke and Heskes (2011), which has the
advantage of being implementable on a parallel architecture, and one version
which bridges the gap between standard EP and parallel EP. We illustrate our
approach with an expensive application of ABC, namely inference on spatial
extremes.Comment: To appear in the forthcoming Handbook of Approximate Bayesian
Computation (ABC), edited by S. Sisson, L. Fan, and M. Beaumon
- …