95 research outputs found
Deep Gaussian processes for regression using approximate expectation propagation
Deep Gaussian processes (DGPs) are multi-layer hierarchical generalisations
of Gaussian processes (GPs) and are formally equivalent to neural networks with
multiple, infinitely wide hidden layers. DGPs are nonparametric probabilistic
models and as such are arguably more flexible, have a greater capacity to
generalise, and provide better calibrated uncertainty estimates than
alternative deep models. This paper develops a new approximate Bayesian
learning scheme that enables DGPs to be applied to a range of medium to large
scale regression problems for the first time. The new method uses an
approximate Expectation Propagation procedure and a novel and efficient
extension of the probabilistic backpropagation algorithm for learning. We
evaluate the new method for non-linear regression on eleven real-world
datasets, showing that it always outperforms GP regression and is almost always
better than state-of-the-art deterministic and sampling-based approximate
inference methods for Bayesian neural networks. As a by-product, this work
provides a comprehensive analysis of six approximate Bayesian methods for
training neural networks
Recommended from our members
Black-Box α-divergence minimization
Black-box alpha (BB-α) is a new approximate inference method based on the minimization of α-divergences. BB-α scales to large datasets because it can be implemented using stochastic gradient descent. BB-α can be applied to complex probabilistic models with little effort since it only requires as input the likelihood function and its gradients. These gradients can be easily obtained using automatic differentiation. By changing the divergence parameter α, the method is able to interpolate between variational Bayes (VB) (α → 0) and an algorithm similar to expectation propagation (EP) (α = 1). Experiments on probit regression and neural network regression and classification problems show that BB-a with non-standard settings of α, such as α = 0.5, usually produces better predictions than with α → 0 (VB) or α = 1 (EP).JMHL acknowledges support from the Rafael del Pino Foundation. YL thanks the Schlumberger Foundation Faculty for the Future fellowship on supporting her PhD study. MR acknowledges support from UK Engineering and Physical Sciences Research Council (EPSRC) grant EP/L016516/1 for the University of Cambridge Centre for Doctoral Training, the Cambridge Centre for Analysis. TDB thanks Google for funding his European Doctoral Fellowship. DHL acknowledge support from Plan National I+D+i, Grant TIN2013-42351-P and TIN2015- 70308-REDT, and from Comunidad de Madrid, Grant S2013/ICE-2845 CASI-CAM-CM. RET thanks EPSRC grant #EP/L000776/1 and #EP/M026957/1
Training Deep Gaussian Processes using Stochastic Expectation Propagation and Probabilistic Backpropagation
Deep Gaussian processes (DGPs) are multi-layer hierarchical generalisations
of Gaussian processes (GPs) and are formally equivalent to neural networks with
multiple, infinitely wide hidden layers. DGPs are probabilistic and
non-parametric and as such are arguably more flexible, have a greater capacity
to generalise, and provide better calibrated uncertainty estimates than
alternative deep models. The focus of this paper is scalable approximate
Bayesian learning of these networks. The paper develops a novel and efficient
extension of probabilistic backpropagation, a state-of-the-art method for
training Bayesian neural networks, that can be used to train DGPs. The new
method leverages a recently proposed method for scaling Expectation
Propagation, called stochastic Expectation Propagation. The method is able to
automatically discover useful input warping, expansion or compression, and it
is therefore is a flexible form of Bayesian kernel design. We demonstrate the
success of the new method for supervised learning on several real-world
datasets, showing that it typically outperforms GP regression and is never much
worse
On the impact of covariance functions in multi-objective Bayesian optimization for engineering design
This is the author accepted manuscript. The final version is available from the publisher via the DOI in this recordMulti-objective Bayesian optimization (BO) is a highly useful class of methods that can effectively solve computationally expensive engineering design optimization problems with multiple objectives. However, the impact of covariance function, which is an important part of multi-objective BO, is rarely studied in the context of engineering optimization. We aim to shed light on this issue by performing numerical experiments on engineering design optimization problems, primarily low-fidelity problems so that we are able to statistically evaluate the performance of BO methods with various covariance functions. In this paper, we performed the study using a set of subsonic airfoil optimization cases as benchmark problems. Expected hypervolume improvement was used as the acquisition function to enrich the experimental design. Results show that the choice of the covariance function give a notable impact on the performance of multi-objective BO. In this regard, Kriging models with Matern-3/2 is the most robust method in terms of the diversity and convergence to the Pareto front that can handle problems with various complexities.Natural Environment Research Council (NERC
Recommended from our members
Sequence tutor: Conservative fine-tuning of sequence generation models with KL-control
This paper proposes a general method for improving the structure and quality
of sequences generated by a recurrent neural network (RNN), while maintaining
information originally learned from data, as well as sample diversity. An RNN
is first pre-trained on data using maximum likelihood estimation (MLE), and the
probability distribution over the next token in the sequence learned by this
model is treated as a prior policy. Another RNN is then trained using
reinforcement learning (RL) to generate higher-quality outputs that account for
domain-specific incentives while retaining proximity to the prior policy of the
MLE RNN. To formalize this objective, we derive novel off-policy RL methods for
RNNs from KL-control. The effectiveness of the approach is demonstrated on two
applications; 1) generating novel musical melodies, and 2) computational
molecular generation. For both problems, we show that the proposed method
improves the desired properties and structure of the generated sequences, while
maintaining information learned from data
A Geometric Variational Approach to Bayesian Inference
We propose a novel Riemannian geometric framework for variational inference
in Bayesian models based on the nonparametric Fisher-Rao metric on the manifold
of probability density functions. Under the square-root density representation,
the manifold can be identified with the positive orthant of the unit
hypersphere in L2, and the Fisher-Rao metric reduces to the standard L2 metric.
Exploiting such a Riemannian structure, we formulate the task of approximating
the posterior distribution as a variational problem on the hypersphere based on
the alpha-divergence. This provides a tighter lower bound on the marginal
distribution when compared to, and a corresponding upper bound unavailable
with, approaches based on the Kullback-Leibler divergence. We propose a novel
gradient-based algorithm for the variational problem based on Frechet
derivative operators motivated by the geometry of the Hilbert sphere, and
examine its properties. Through simulations and real-data applications, we
demonstrate the utility of the proposed geometric framework and algorithm on
several Bayesian models
On the Use of Upper Trust Bounds in Constrained Bayesian Optimization Infill Criterion
In order to handle constrained optimization problems with a large number of design variables, a new approach has been proposed to address constraints in a surrogate-based optimization framework. This approach focuses on sequential enrichment using adaptive surrogate models based on Bayesian optimization approach, and Gaussian process models. A constraints criterion using the uncertainty estimation of the Gaussian process models is introduced. Different evolutions of the algorithm, based on the accuracy of the constraints surrogate models, are used for selecting the infill sample points. The resulting algorithm has been tested on the well known modified Branin optimization problem
The Variational Garrote
In this paper, we present a new variational method for sparse regression
using regularization. The variational parameters appear in the
approximate model in a way that is similar to Breiman's Garrote model. We refer
to this method as the variational Garrote (VG). We show that the combination of
the variational approximation and regularization has the effect of making
the problem effectively of maximal rank even when the number of samples is
small compared to the number of variables. The VG is compared numerically with
the Lasso method, ridge regression and the recently introduced paired mean
field method (PMF) (M. Titsias & M. L\'azaro-Gredilla., NIPS 2012). Numerical
results show that the VG and PMF yield more accurate predictions and more
accurately reconstruct the true model than the other methods. It is shown that
the VG finds correct solutions when the Lasso solution is inconsistent due to
large input correlations. Globally, VG is significantly faster than PMF and
tends to perform better as the problems become denser and in problems with
strongly correlated inputs. The naive implementation of the VG scales cubic
with the number of features. By introducing Lagrange multipliers we obtain a
dual formulation of the problem that scales cubic in the number of samples, but
close to linear in the number of features.Comment: 26 pages, 11 figure
Probabilistic machine learning and artificial intelligence.
How can a machine learn from experience? Probabilistic modelling provides a framework for understanding what learning is, and has therefore emerged as one of the principal theoretical and practical approaches for designing machines that learn from data acquired through experience. The probabilistic framework, which describes how to represent and manipulate uncertainty about models and predictions, has a central role in scientific data analysis, machine learning, robotics, cognitive science and artificial intelligence. This Review provides an introduction to this framework, and discusses some of the state-of-the-art advances in the field, namely, probabilistic programming, Bayesian optimization, data compression and automatic model discovery.The author acknowledges an EPSRC grant EP/I036575/1, the DARPA PPAML programme, a Google Focused Research Award for the Automatic Statistician and support from Microsoft Research.This is the author accepted manuscript. The final version is available from NPG at http://www.nature.com/nature/journal/v521/n7553/full/nature14541.html#abstract
- …