199 research outputs found
Derivative observations in Gaussian Process models of dynamic systems
Gaussian processes provide an approach to nonparametric modelling which allows a straightforward combination of function and derivative observations in an empirical model. This is of particular importance in identification of nonlinear dynamic systems from experimental data. 1)It allows us to combine derivative information, and associated uncertainty with normal function observations into the learning and inference process. This derivative information can be in the form of priors specified by an expert or identified from perturbation data close to equilibrium. 2) It allows a seamless fusion of multiple local linear models in a consistent manner, inferring consistent models and ensuring that integrability constraints are met. 3) It improves dramatically the computational efficiency of Gaussian process models for dynamic system identification, by summarising large quantities of near-equilibrium data by a handful of linearisations, reducing the training size - traditionally a problem for Gaussian process models
Learning Deep Mixtures of Gaussian Process Experts Using Sum-Product Networks
While Gaussian processes (GPs) are the method of choice for regression tasks,
they also come with practical difficulties, as inference cost scales cubic in
time and quadratic in memory. In this paper, we introduce a natural and
expressive way to tackle these problems, by incorporating GPs in sum-product
networks (SPNs), a recently proposed tractable probabilistic model allowing
exact and efficient inference. In particular, by using GPs as leaves of an SPN
we obtain a novel flexible prior over functions, which implicitly represents an
exponentially large mixture of local GPs. Exact and efficient posterior
inference in this model can be done in a natural interplay of the inference
mechanisms in GPs and SPNs. Thereby, each GP is -- similarly as in a mixture of
experts approach -- responsible only for a subset of data points, which
effectively reduces inference cost in a divide and conquer fashion. We show
that integrating GPs into the SPN framework leads to a promising probabilistic
regression model which is: (1) computational and memory efficient, (2) allows
efficient and exact posterior inference, (3) is flexible enough to mix
different kernel functions, and (4) naturally accounts for non-stationarities
in time series. In a variate of experiments, we show that the SPN-GP model can
learn input dependent parameters and hyper-parameters and is on par with or
outperforms the traditional GPs as well as state of the art approximations on
real-world data
Distributed Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models
Gaussian processes (GPs) are a powerful tool for probabilistic inference over
functions. They have been applied to both regression and non-linear
dimensionality reduction, and offer desirable properties such as uncertainty
estimates, robustness to over-fitting, and principled ways for tuning
hyper-parameters. However the scalability of these models to big datasets
remains an active topic of research. We introduce a novel re-parametrisation of
variational inference for sparse GP regression and latent variable models that
allows for an efficient distributed algorithm. This is done by exploiting the
decoupling of the data given the inducing points to re-formulate the evidence
lower bound in a Map-Reduce setting. We show that the inference scales well
with data and computational resources, while preserving a balanced distribution
of the load among the nodes. We further demonstrate the utility in scaling
Gaussian processes to big data. We show that GP performance improves with
increasing amounts of data in regression (on flight data with 2 million
records) and latent variable modelling (on MNIST). The results show that GPs
perform better than many common models often used for big data.Comment: 9 pages, 8 figure
Rates of Convergence for Sparse Variational Gaussian Process Regression
Excellent variational approximations to Gaussian process posteriors have been developed which avoid the O(NĀ³) scaling with dataset size N. They reduce the computational cost to O(NMĀ²), with MāŖN being the number of inducing variables, which summarise the process. While the computational cost seems to be linear in N, the true complexity of the algorithm depends on how M must increase to ensure a certain quality of approximation. We address this by characterising the behavior of an upper bound on the KL divergence to the posterior. We show that with high probability the KL divergence can be made arbitrarily small by growing M more slowly than N. A particular case of interest is that for regression with normally distributed inputs in D-dimensions with the popular Squared Exponential kernel, M = O(log^DN) is sufficient. Our results show that as datasets grow, Gaussian process posteriors can truly be approximated cheaply, and provide a concrete rule for how to increase M in continual learning scenarios
Sparse Gaussian Process Hyperparameters: Optimize or Integrate?
The kernel function and its hyperparameters are the central model selection
choice in a Gaussian proces (Rasmussen and Williams, 2006). Typically, the
hyperparameters of the kernel are chosen by maximising the marginal likelihood,
an approach known as Type-II maximum likelihood (ML-II). However, ML-II does
not account for hyperparameter uncertainty, and it is well-known that this can
lead to severely biased estimates and an underestimation of predictive
uncertainty. While there are several works which employ a fully Bayesian
characterisation of GPs, relatively few propose such approaches for the sparse
GPs paradigm. In this work we propose an algorithm for sparse Gaussian process
regression which leverages MCMC to sample from the hyperparameter posterior
within the variational inducing point framework of Titsias (2009). This work is
closely related to Hensman et al. (2015b) but side-steps the need to sample the
inducing points, thereby significantly improving sampling efficiency in the
Gaussian likelihood case. We compare this scheme against natural baselines in
literature along with stochastic variational GPs (SVGPs) along with an
extensive computational analysis.Comment: NeurIPS 202
Egalitarian justice and expected value
According to all-luck egalitarianism, the differential distributive effects of both brute luck, which defines the outcome of risks which are not deliberately taken, and option luck, which defines the outcome of deliberate gambles, are unjust. Exactly how to correct the effects of option luck is, however, a complex issue. This article argues that (a) option luck should be neutralized not just by correcting luck among gamblers, but among the community as a whole, because it would be unfair for gamblers as a group to be disadvantaged relative to non-gamblers by bad option luck; (b) individuals should receive the warranted expected results of their gambles, except insofar as individuals blamelessly lacked the ability to ascertain which expectations were warranted; and (c) where societal resources are insufficient to deliver expected results to gamblers, gamblers should receive a lesser distributive share which is in proportion to the expected results. Where all-luck egalitarianism is understood in this way, it allows risk-takers to impose externalities on non-risk-takers, which seems counterintuitive. This may, however, be an advantage as it provides a luck egalitarian rationale for assisting ānegligent victimsā
- ā¦