138 research outputs found
Mat\'ern Gaussian Processes on Graphs
Gaussian processes are a versatile framework for learning unknown functions
in a manner that permits one to utilize prior information about their
properties. Although many different Gaussian process models are readily
available when the input space is Euclidean, the choice is much more limited
for Gaussian processes whose input space is an undirected graph. In this work,
we leverage the stochastic partial differential equation characterization of
Mat\'ern Gaussian processes - a widely-used model class in the Euclidean
setting - to study their analog for undirected graphs. We show that the
resulting Gaussian processes inherit various attractive properties of their
Euclidean and Riemannian analogs and provide techniques that allow them to be
trained using standard methods, such as inducing points. This enables graph
Mat\'ern Gaussian processes to be employed in mini-batch and non-conjugate
settings, thereby making them more accessible to practitioners and easier to
deploy within larger learning frameworks
Overcoming mean-field approximations in recurrent Gaussian process models
We identify a new variational inference scheme for dynamical systems whose transition function is modelled by a Gaussian process. Inference in this setting has either employed computationally intensive MCMC methods, or relied on factorisations of the variational posterior. As we demonstrate in our experiments, the factorisation between latent system states and transition function can lead to a miscalibrated posterior and to learning unnecessarily large noise terms. We eliminate this factorisation by explicitly modelling the dependence between state trajectories and the Gaussian process posterior. Samples of the latent states can then be tractably generated by conditioning on this representation. The method we obtain (VCDT: variationally coupled dynamics and trajectories) gives better predictive performance and more calibrated estimates of the transition function, yet maintains the same time and space complexities as mean-field methods. Code is available at: g i t h u b . c o m / i a l o n g / G P t
Ultra-fast Deep Mixtures of Gaussian Process Experts
Mixtures of experts have become an indispensable tool for flexible modelling
in a supervised learning context, and sparse Gaussian processes (GP) have shown
promise as a leading candidate for the experts in such models. In the present
article, we propose to design the gating network for selecting the experts from
such mixtures of sparse GPs using a deep neural network (DNN). This combination
provides a flexible, robust, and efficient model which is able to significantly
outperform competing models. We furthermore consider efficient approaches to
computing maximum a posteriori (MAP) estimators of these models by iteratively
maximizing the distribution of experts given allocations and allocations given
experts. We also show that a recently introduced method called
Cluster-Classify-Regress (CCR) is capable of providing a good approximation of
the optimal solution extremely quickly. This approximation can then be further
refined with the iterative algorithm
Pathwise Conditioning of Gaussian Processes
As Gaussian processes are used to answer increasingly complex questions,
analytic solutions become scarcer and scarcer. Monte Carlo methods act as a
convenient bridge for connecting intractable mathematical expressions with
actionable estimates via sampling. Conventional approaches for simulating
Gaussian process posteriors view samples as draws from marginal distributions
of process values at finite sets of input locations. This distribution-centric
characterization leads to generative strategies that scale cubically in the
size of the desired random vector. These methods are prohibitively expensive in
cases where we would, ideally, like to draw high-dimensional vectors or even
continuous sample paths. In this work, we investigate a different line of
reasoning: rather than focusing on distributions, we articulate Gaussian
conditionals at the level of random variables. We show how this pathwise
interpretation of conditioning gives rise to a general family of approximations
that lend themselves to efficiently sampling Gaussian process posteriors.
Starting from first principles, we derive these methods and analyze the
approximation errors they introduce. We, then, ground these results by
exploring the practical implications of pathwise conditioning in various
applied settings, such as global optimization and reinforcement learning
Spatio-temporal variational Gaussian processes
We introduce a scalable approach to Gaussian process inference that combines spatio-temporal filtering with natural gradient variational inference, resulting in a non-conjugate GP method for multivariate data that scales linearly with respect to time. Our natural gradient approach enables application of parallel filtering and smoothing, further reducing the temporal span complexity to be logarithmic in the number of time steps. We derive a sparse approximation that constructs a state-space model over a reduced set of spatial inducing points, and show that for separable Markov kernels the full and sparse cases exactly recover the standard variational GP, whilst exhibiting favourable computational properties. To further improve the spatial scaling we propose a mean-field assumption of independence between spatial locations which, when coupled with sparsity and parallelisation, leads to an efficient and accurate method for large spatio-temporal problems
Approximate inference methods in probabilistic machine learning and Bayesian statistics
This thesis develops new methods for efficient approximate inference in probabilistic models. Such models are routinely used in different fields, yet they remain computationally challenging as they involve high-dimensional integrals. We propose different approximate inference approaches addressing some challenges in probabilistic machine learning and Bayesian statistics. First, we present a Bayesian framework for genome-wide inference of DNA methylation levels and devise an efficient particle filtering and smoothing algorithm that can be used to identify differentially methylated regions between case and control groups. Second, we present a scalable inference approach for state space models by combining variational methods with sequential Monte Carlo sampling. The method is applied to self-exciting point process models that allow for flexible dynamics in the latent intensity function. Third, a new variational density motivated by copulas is developed. This new variational family can be beneficial compared with Gaussian approximations, as illustrated on examples with Bayesian neural networks. Lastly, we make some progress in a gradient-based adaptation of Hamiltonian Monte Carlo samplers by maximizing an approximation of the proposal entropy
Numerically Stable Sparse Gaussian Processes via Minimum Separation using Cover Trees
Gaussian processes are frequently deployed as part of larger machine learning
and decision-making systems, for instance in geospatial modeling, Bayesian
optimization, or in latent Gaussian models. Within a system, the Gaussian
process model needs to perform in a stable and reliable manner to ensure it
interacts correctly with other parts of the system. In this work, we study the
numerical stability of scalable sparse approximations based on inducing points.
To do so, we first review numerical stability, and illustrate typical
situations in which Gaussian process models can be unstable. Building on
stability theory originally developed in the interpolation literature, we
derive sufficient and in certain cases necessary conditions on the inducing
points for the computations performed to be numerically stable. For
low-dimensional tasks such as geospatial modeling, we propose an automated
method for computing inducing points satisfying these conditions. This is done
via a modification of the cover tree data structure, which is of independent
interest. We additionally propose an alternative sparse approximation for
regression with a Gaussian likelihood which trades off a small amount of
performance to further improve stability. We provide illustrative examples
showing the relationship between stability of calculations and predictive
performance of inducing point methods on spatial tasks
- …