Search CORE

95 research outputs found

Deep Gaussian processes for regression using approximate expectation propagation

Author: Bui TD
Hernández-Lobato D
Hernández-Lobato JM
Li Y
Turner RE
Publication venue: 33rd International Conference on Machine Learning, ICML 2016
Publication date: 01/01/2016
Field of study

Deep Gaussian processes (DGPs) are multi-layer hierarchical generalisations of Gaussian processes (GPs) and are formally equivalent to neural networks with multiple, infinitely wide hidden layers. DGPs are nonparametric probabilistic models and as such are arguably more flexible, have a greater capacity to generalise, and provide better calibrated uncertainty estimates than alternative deep models. This paper develops a new approximate Bayesian learning scheme that enables DGPs to be applied to a range of medium to large scale regression problems for the first time. The new method uses an approximate Expectation Propagation procedure and a novel and efficient extension of the probabilistic backpropagation algorithm for learning. We evaluate the new method for non-linear regression on eleven real-world datasets, showing that it always outperforms GP regression and is almost always better than state-of-the-art deterministic and sampling-based approximate inference methods for Bayesian neural networks. As a by-product, this work provides a comprehensive analysis of six approximate Bayesian methods for training neural networks

arXiv.org e-Print Archive

Apollo (Cambridge)

Recommended from our members

Black-Box α-divergence minimization

Author: Bui TD
Hernández-Lobato D
Hernández-Lobato JM
Li Y
Rowland M
Turner RE
Publication venue: Proceedings of the 33rd International Conference on Machine Learning
Publication date: 25/05/2016
Field of study

Black-box alpha (BB-α) is a new approximate inference method based on the minimization of α-divergences. BB-α scales to large datasets because it can be implemented using stochastic gradient descent. BB-α can be applied to complex probabilistic models with little effort since it only requires as input the likelihood function and its gradients. These gradients can be easily obtained using automatic differentiation. By changing the divergence parameter α, the method is able to interpolate between variational Bayes (VB) (α → 0) and an algorithm similar to expectation propagation (EP) (α = 1). Experiments on probit regression and neural network regression and classification problems show that BB-a with non-standard settings of α, such as α = 0.5, usually produces better predictions than with α → 0 (VB) or α = 1 (EP).JMHL acknowledges support from the Rafael del Pino Foundation. YL thanks the Schlumberger Foundation Faculty for the Future fellowship on supporting her PhD study. MR acknowledges support from UK Engineering and Physical Sciences Research Council (EPSRC) grant EP/L016516/1 for the University of Cambridge Centre for Doctoral Training, the Cambridge Centre for Analysis. TDB thanks Google for funding his European Doctoral Fellowship. DHL acknowledge support from Plan National I+D+i, Grant TIN2013-42351-P and TIN2015- 70308-REDT, and from Comunidad de Madrid, Grant S2013/ICE-2845 CASI-CAM-CM. RET thanks EPSRC grant #EP/L000776/1 and #EP/M026957/1

Apollo (Cambridge)

Training Deep Gaussian Processes using Stochastic Expectation Propagation and Probabilistic Backpropagation

Author: Bui Thang D
Hernández-Lobato Daniel
Hernández-Lobato José Miguel
Li Yingzhen
Turner Richard E
Publication venue
Publication date: 11/11/2015
Field of study

Deep Gaussian processes (DGPs) are multi-layer hierarchical generalisations of Gaussian processes (GPs) and are formally equivalent to neural networks with multiple, infinitely wide hidden layers. DGPs are probabilistic and non-parametric and as such are arguably more flexible, have a greater capacity to generalise, and provide better calibrated uncertainty estimates than alternative deep models. The focus of this paper is scalable approximate Bayesian learning of these networks. The paper develops a novel and efficient extension of probabilistic backpropagation, a state-of-the-art method for training Bayesian neural networks, that can be used to train DGPs. The new method leverages a recently proposed method for scaling Expectation Propagation, called stochastic Expectation Propagation. The method is able to automatically discover useful input warping, expansion or compression, and it is therefore is a flexible form of Bayesian kernel design. We demonstrate the success of the new method for supervised learning on several real-world datasets, showing that it typically outperforms GP regression and is never much worse

arXiv.org e-Print Archive

Apollo (Cambridge)

On the impact of covariance functions in multi-objective Bayesian optimization for engineering design

Author: Drucker H.
Hebbal A.
Hernández-Lobato D.
Hernández-Lobato J. M.
Hoffman M. D.
Palar P. S.
Stein M. L.
Publication venue: 'American Institute of Aeronautics and Astronautics (AIAA)'
Publication date: 01/01/2020
Field of study

This is the author accepted manuscript. The final version is available from the publisher via the DOI in this recordMulti-objective Bayesian optimization (BO) is a highly useful class of methods that can effectively solve computationally expensive engineering design optimization problems with multiple objectives. However, the impact of covariance function, which is an important part of multi-objective BO, is rarely studied in the context of engineering optimization. We aim to shed light on this issue by performing numerical experiments on engineering design optimization problems, primarily low-fidelity problems so that we are able to statistically evaluate the performance of BO methods with various covariance functions. In this paper, we performed the study using a set of subsonic airfoil optimization cases as benchmark problems. Expected hypervolume improvement was used as the acquisition function to enrich the experimental design. Results show that the choice of the covariance function give a notable impact on the performance of multi-objective BO. In this regard, Kriging models with Matern-3/2 is the most robust method in terms of the diversity and convergence to the Pareto front that can handle problems with various complexities.Natural Environment Research Council (NERC

Crossref

Open Research Exeter

Cronfa at Swansea University

Recommended from our members

Sequence tutor: Conservative fine-tuning of sequence generation models with KL-control

Author: Bahdanau D
Eck D
Gu S
Hernández-Lobato JM
Jaques N
Turner RE
Publication venue: 34th International Conference on Machine Learning, ICML 2017
Publication date: 01/01/2017
Field of study

This paper proposes a general method for improving the structure and quality of sequences generated by a recurrent neural network (RNN), while maintaining information originally learned from data, as well as sample diversity. An RNN is first pre-trained on data using maximum likelihood estimation (MLE), and the probability distribution over the next token in the sequence learned by this model is treated as a prior policy. Another RNN is then trained using reinforcement learning (RL) to generate higher-quality outputs that account for domain-specific incentives while retaining proximity to the prior policy of the MLE RNN. To formalize this objective, we derive novel off-policy RL methods for RNNs from KL-control. The effectiveness of the approach is demonstrated on two applications; 1) generating novel musical melodies, and 2) computational molecular generation. For both problems, we show that the proposed method improves the desired properties and structure of the generated sequences, while maintaining information learned from data

Apollo (Cambridge)

MPG.PuRe

A Geometric Variational Approach to Bayesian Inference

Author: Abhijoy Saha
Barber D.
Bauer M.
Bhattacharyya A
Bishop C. M
Broderick T.
Chen T.
Ghahramani Z.
Hernández-Lobato J.
Hoffman M.
Hoffman M. D.
Jaakkola T.
Karthik Bharath
Kass R. E.
Kingma D. P.
Kucukelbir A.
Lang S
Li Y.
Minka T. P
Rao C. R
Rezende D.
Rényi A
Saul L. K.
Sebastian Kurtek
Sigillito V. G.
Srivastava A.
Tan L. S
Wang C.
Yeung D.
Publication venue
Publication date: 27/03/2019
Field of study

We propose a novel Riemannian geometric framework for variational inference in Bayesian models based on the nonparametric Fisher-Rao metric on the manifold of probability density functions. Under the square-root density representation, the manifold can be identified with the positive orthant of the unit hypersphere in L2, and the Fisher-Rao metric reduces to the standard L2 metric. Exploiting such a Riemannian structure, we formulate the task of approximating the posterior distribution as a variational problem on the hypersphere based on the alpha-divergence. This provides a tighter lower bound on the marginal distribution when compared to, and a corresponding upper bound unavailable with, approaches based on the Kullback-Leibler divergence. We propose a novel gradient-based algorithm for the variational problem based on Frechet derivative operators motivated by the geometry of the Hilbert sphere, and examine its properties. Through simulations and real-data applications, we demonstrate the utility of the proposed geometric framework and algorithm on several Bayesian models

arXiv.org e-Print Archive

Crossref

Repository@Nottingham

FigShare

On the Use of Upper Trust Bounds in Constrained Bayesian Optimization Infill Criterion

Author: Hernández-Lobato J. M.
Kandasamy K.
Krige D. G.
Minka T. P.
Picheny V.
Srinivas N.
Wang Z.
Wang Z.
Publication venue: 'American Institute of Aeronautics and Astronautics (AIAA)'
Publication date: 01/01/2019
Field of study

In order to handle constrained optimization problems with a large number of design variables, a new approach has been proposed to address constraints in a surrogate-based optimization framework. This approach focuses on sequential enrichment using adaptive surrogate models based on Bayesian optimization approach, and Gaussian process models. A constraints criterion using the uncertainty estimation of the Gaussian process models is introduced. Different evolutions of the algorithm, based on the accuracy of the constraints surrogate models, are used for selecting the infill sample points. The resulting algorithm has been tested on the well known modified Branin optimization problem

Crossref

Open Archive Toulouse Archive Ouverte

PolyPublie

The Variational Garrote

Author: B. A. Logsdon
D. Barber
D. Hernández-Lobato
D. L. Donoho
E. I. George
E. J. Candes
H. Attias
H. Ishwaran
H. J. Kappen
Hilbert J. Kappen
I. E. Frank
J. E. Griffin
J. Fan
J. H. Friedman
K. P. Murphy
L. Breiman
M. Clyde
M. I. Jordan
M. J. Wainwright
M. Opper
M. Titsias
P. Carbonetto
P. J. Brown
P. Zhao
R. B. O’Hara
R. Mazumder
R. Tibshirani
R. Tibshirani
R. Yoshida
T. J. Mitchell
T. Park
Vicenç Gómez
Y. Kabashima
Publication venue
Publication date: 01/01/2012
Field of study

In this paper, we present a new variational method for sparse regression using

L_0

regularization. The variational parameters appear in the approximate model in a way that is similar to Breiman's Garrote model. We refer to this method as the variational Garrote (VG). We show that the combination of the variational approximation and

L_0

regularization has the effect of making the problem effectively of maximal rank even when the number of samples is small compared to the number of variables. The VG is compared numerically with the Lasso method, ridge regression and the recently introduced paired mean field method (PMF) (M. Titsias & M. L\'azaro-Gredilla., NIPS 2012). Numerical results show that the VG and PMF yield more accurate predictions and more accurately reconstruct the true model than the other methods. It is shown that the VG finds correct solutions when the Lasso solution is inconsistent due to large input correlations. Globally, VG is significantly faster than PMF and tends to perform better as the problems become denser and in problems with strongly correlated inputs. The naive implementation of the VG scales cubic with the number of features. By introducing Lagrange multipliers we obtain a dual formulation of the problem that scales cubic in the number of samples, but close to linear in the number of features.Comment: 26 pages, 11 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

Springer - Publisher Connector

Radboud Repository

UPF Digital Repository

Probabilistic machine learning and artificial intelligence.

Author: A Doucet
A Gelman
A Korattikara
A Krizhevsky
A O'Hagan
A Pfeffer
A Pfeffer
A Pfeffer
B Bakker
B De Finetti
B Fischer
B Milch
B Paige
C Freer
C Kemp
C Lu
C Shannon
C Thornton
CE Rasmussen
CE Rasmussen
CE Rasmussen
CM Bishop
CM Bishop
D Koller
D Koller
D Wingate
DE Wolstenholme
DJ Hand
DJ Lunn
DJC MacKay
DM Wolpert
DR Jones
ET Jaynes
F Wood
F Wood
G Hinton
GE Hinton
GF Marcus
H Kushner
H Robbins
I Sutskever
J Bergstra
J Hensman
J Snoek
JB Tenenbaum
JM Hernández-Lobato
JR Lloyd
K Doya
K Miller
KP Murphy
KS Van Horn
L Li
LR Rabiner
M Girolami
M Hoffman
M Jordan
M Medvedovic
M Schmidt
M Welling
MI Jordan
MP Deisenroth
N Goodman
N Hjort
N Houlsby
ND Goodman
ND Goodman
P Diaconis
P Hennig
P Marjoram
P Orbanz
P Poupart
P Sermanet
RB Grosse
RD King
RM Neal
RM Neal
RM Neal
RM Neal
RP Adams
RT Cox
S Deneve
S Russell
S Thrun
SJ Russell
TL Griffiths
TL Griffiths
TP Minka
TP Minka
TS Ferguson
V Mansinghka
WH Jefferys
Y Bengio
YW Teh
Z Ghahramani
Publication venue: 'The Nature Conservancy'
Publication date: 01/05/2015
Field of study

How can a machine learn from experience? Probabilistic modelling provides a framework for understanding what learning is, and has therefore emerged as one of the principal theoretical and practical approaches for designing machines that learn from data acquired through experience. The probabilistic framework, which describes how to represent and manipulate uncertainty about models and predictions, has a central role in scientific data analysis, machine learning, robotics, cognitive science and artificial intelligence. This Review provides an introduction to this framework, and discusses some of the state-of-the-art advances in the field, namely, probabilistic programming, Bayesian optimization, data compression and automatic model discovery.The author acknowledges an EPSRC grant EP/I036575/1, the DARPA PPAML programme, a Google Focused Research Award for the Automatic Statistician and support from Microsoft Research.This is the author accepted manuscript. The final version is available from NPG at http://www.nature.com/nature/journal/v521/n7553/full/nature14541.html#abstract

Crossref

Apollo (Cambridge)

Estimating genome-wide regulatory activity from multi-omics data sets using mathematical optimization

Author: A Esquela-Kerscher
A Krämer
A Lachmann
A Sandelin
A Su
A Yates
AA Margolin
B Lemon
B Vogelstein
B Zacher
C Rubie
CE Vejnar
CR Clapier
D Hernández-Lobato
D Marbach
D Ray
DJ Allocco
DS Johnson
DT Ross
DW Je
E Wingender
F Markowetz
F Markowetz
F Spitz
FJ Couch
G Geeven
H Fröhlich
H Guo
H Liu
H Liu
HG Roider
J Brandt
J Ernst
J Rung
Jannes Münchmeyer
JB Greer
JC Huang
JK Pickrell
JM Vaquerizas
JN Weinstein
KL Abbott
KS Hoek
LS Hsu
M Bansal
M Bux
M Hecker
M Kouwenhove van
M Sadelain
M Tompa
MB Gerstein
MW Mayo
N Rajewsky
National Cancer Institute Wiki
NM Luscombe
O Elemento
P Jiang
P Li
P Mendes
P Thomas
PA Futreal
PJ Balwierz
PK Davidsen
Q Jiang
R Edgar
R Jaenisch
R Opgen-Rhein
RH Shoemaker
S Durinck
S Griffiths-Jones
S Lou
S Wang
Saskia Trescher
SD Hsu
T Cover
T Schacht
T Stiewe
The Cancer Genome Atlas Research Network
The Cancer Genome Atlas Research Network
U Klein
Ulf Leser
VA Huynh-Thu
VA Naumov
X Bisteau
X Yang
Y Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref