Bayesian quadrature for Gaussian process kernel learning, neural ensemble search, and high dimensional integrands

Abstract

The central challenge of performing inference in a model is the computation of marginalisation integrals over the model's parameters. In most cases of interest, these integrals are intractable, and evaluation of the integrand is expensive. A probabilistic approach to numerical integration offers a principled framework for allocating computation in such a setting. This is achieved by using a probabilistic surrogate to model the integrand, and selecting evaluations Bayesian Decision Theoretically. We offer Bayesian Quadrature (BQ) schemes that incorporate special structure in the model parameters for two widely-used model classes: Gaussian Processes (for which we marginalise over a broad class of stationary kernels), and Neural Networks (for which we marginalise over a large space of architectures). We further investigate the use of scalable approximations of Gaussian Processes for scaling BQ to higher dimensional (Euclidean) spaces for non-negative integrands. For GP kernel learning, our BQ framework makes use of the maximum mean discrepancies between distributions to define a kernel over kernels that captures invariances between Spectral Mixture (SM) Kernels. Kernel samples are then selected by generalising an information-theoretic acquisition function for warped BQ. By viewing ensembling as approximately marginalising over architectures, we bring the tools of BQ to bear upon Neural Ensemble Search. Additionally, the resulting ensembles consist of architectures weighted commensurately with their performance, unlike previous approaches that use equally weighted ensembles. The core challenge of scaling BQ to higher dimensions is the cubic complexity of GP regression. We explore the use of scalable approximations to GPs for BQ, particularly the recently proposed VISH model - a Variational GP for which the inter-domain inducing variables are projections of the modelled function onto the spherical harmonics - and Bezier GP model - defined by placing a Gaussian distribution over the control points of a Bezier curve

    Similar works