4 research outputs found
Sampling from Gaussian Process Posteriors using Stochastic Gradient Descent
Gaussian processes are a powerful framework for quantifying uncertainty and
for sequential decision-making but are limited by the requirement of solving
linear systems. In general, this has a cubic cost in dataset size and is
sensitive to conditioning. We explore stochastic gradient algorithms as a
computationally efficient method of approximately solving these linear systems:
we develop low-variance optimization objectives for sampling from the posterior
and extend these to inducing points. Counterintuitively, stochastic gradient
descent often produces accurate predictions, even in cases where it does not
converge quickly to the optimum. We explain this through a spectral
characterization of the implicit bias from non-convergence. We show that
stochastic gradient descent produces predictive distributions close to the true
posterior both in regions with sufficient data coverage, and in regions
sufficiently far away from the data. Experimentally, stochastic gradient
descent achieves state-of-the-art performance on sufficiently large-scale or
ill-conditioned regression tasks. Its uncertainty estimates match the
performance of significantly more expensive baselines on a large-scale Bayesian
optimization task
Beyond Intuition, a Framework for Applying GPs to Real-World Data
Gaussian Processes (GPs) offer an attractive method for regression over
small, structured and correlated datasets. However, their deployment is
hindered by computational costs and limited guidelines on how to apply GPs
beyond simple low-dimensional datasets. We propose a framework to identify the
suitability of GPs to a given problem and how to set up a robust and
well-specified GP model. The guidelines formalise the decisions of experienced
GP practitioners, with an emphasis on kernel design and options for
computational scalability. The framework is then applied to a case study of
glacier elevation change yielding more accurate results at test time.Comment: Accepted at the 1st ICML Workshop on Structured Probabilistic
Inference and Generative Modelling (2023
Stochastic Gradient Descent for Gaussian Processes Done Right
We study the optimisation problem associated with Gaussian process regression
using squared loss. The most common approach to this problem is to apply an
exact solver, such as conjugate gradient descent, either directly, or to a
reduced-order version of the problem. Recently, driven by successes in deep
learning, stochastic gradient descent has gained traction as an alternative. In
this paper, we show that when done right\unicode{x2014}by which we mean using
specific insights from the optimisation and kernel
communities\unicode{x2014}this approach is highly effective. We thus
introduce a particular stochastic dual gradient descent algorithm, that may be
implemented with a few lines of code using any deep learning framework. We
explain our design decisions by illustrating their advantage against
alternatives with ablation studies and show that the new method is highly
competitive. Our evaluations on standard regression benchmarks and a Bayesian
optimisation task set our approach apart from preconditioned conjugate
gradients, variational Gaussian process approximations, and a previous version
of stochastic gradient descent for Gaussian processes. On a molecular binding
affinity prediction task, our method places Gaussian process regression on par
in terms of performance with state-of-the-art graph neural networks
Latent Derivative Bayesian Last Layer Networks
Bayesian neural networks (BNN) are powerful parametric models for nonlinear regression with uncertainty quantification. However, the approximate inference techniques for weight space priors suffer from several drawbacks. The 'Bayesian last layer' (BLL) is an alternative BNN approach that learns the feature space for an exact Bayesian linear model with explicit predictive distributions. However, its predictions outside of the data distribution (OOD) are typically overconfident, as the marginal likelihood objective results in a learned feature space that overfits to the data. We overcome this weakness by introducing a functional prior on the model's derivatives w.r.t. the inputs. Treating these Jacobians as latent variables, we incorporate the prior into the objective to influence the smoothness and diversity of the features, which enables greater predictive uncertainty. For the BLL, the Jacobians can be computed directly using forward mode automatic differentiation, and the distribution over Jacobians may be obtained in closed-form. We demonstrate this method enhances the BLL to Gaussian process-like performance on tasks where calibrated uncertainty is critical: OOD regression, Bayesian optimization and active learning, which include high-dimensional real-world datasets.Peer reviewe