122,913 research outputs found

    Gaussian Processes for Big Data

    Full text link
    We introduce stochastic variational inference for Gaussian process models. This enables the application of Gaussian process (GP) models to data sets containing millions of data points. We show how GPs can be vari- ationally decomposed to depend on a set of globally relevant inducing variables which factorize the model in the necessary manner to perform variational inference. Our ap- proach is readily extended to models with non-Gaussian likelihoods and latent variable models based around Gaussian processes. We demonstrate the approach on a simple toy problem and two real world data sets.Comment: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI2013

    Gaussian Processes with Monotonicity constraints for Big Data

    Get PDF
    Tämän työn tarkoitus on kehittää menetelmä monotonisuusrajoitettujen Gaussisten Prosessien käyttämiseksi suurille aineistoille. Variaatiolaskentaan perustuvaa menetelmää testataan usealla simuloidulla ja oikealla aineistolla. Uuden menetelmän prediktiivistä kykyä verrataan expectation propagation menetelmään, sekä Markov chain Monte Carlo menetelmiin. Työssä saatujen tulosten perusteella voidaan päätellä, että uusi menetelmä toimii ja sitä voidaan käyttää, kun aineistot kasvavat liian suuriksia laskennallisesti raskaille menetelmille.In this thesis, we combine recent advances in monotonicity constraints for Gaussian processes with Big Data inference of Gaussian Proceses. The new variational inference based method is developed and experimented on several simulated and real world data sets by comparing the predictive performance to Expectation Propagation and Markov chain Monte Carlo methods. The results indicate that the new method produces good results and can be used when the data sets get so large that the computationally demanding methods cannot be used

    String and Membrane Gaussian Processes

    Full text link
    In this paper we introduce a novel framework for making exact nonparametric Bayesian inference on latent functions, that is particularly suitable for Big Data tasks. Firstly, we introduce a class of stochastic processes we refer to as string Gaussian processes (string GPs), which are not to be mistaken for Gaussian processes operating on text. We construct string GPs so that their finite-dimensional marginals exhibit suitable local conditional independence structures, which allow for scalable, distributed, and flexible nonparametric Bayesian inference, without resorting to approximations, and while ensuring some mild global regularity constraints. Furthermore, string GP priors naturally cope with heterogeneous input data, and the gradient of the learned latent function is readily available for explanatory analysis. Secondly, we provide some theoretical results relating our approach to the standard GP paradigm. In particular, we prove that some string GPs are Gaussian processes, which provides a complementary global perspective on our framework. Finally, we derive a scalable and distributed MCMC scheme for supervised learning tasks under string GP priors. The proposed MCMC scheme has computational time complexity O(N)\mathcal{O}(N) and memory requirement O(dN)\mathcal{O}(dN), where NN is the data size and dd the dimension of the input space. We illustrate the efficacy of the proposed approach on several synthetic and real-world datasets, including a dataset with 66 millions input points and 88 attributes.Comment: To appear in the Journal of Machine Learning Research (JMLR), Volume 1

    Distributed Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models

    Get PDF
    Gaussian processes (GPs) are a powerful tool for probabilistic inference over functions. They have been applied to both regression and non-linear dimensionality reduction, and offer desirable properties such as uncertainty estimates, robustness to over-fitting, and principled ways for tuning hyper-parameters. However the scalability of these models to big datasets remains an active topic of research. We introduce a novel re-parametrisation of variational inference for sparse GP regression and latent variable models that allows for an efficient distributed algorithm. This is done by exploiting the decoupling of the data given the inducing points to re-formulate the evidence lower bound in a Map-Reduce setting. We show that the inference scales well with data and computational resources, while preserving a balanced distribution of the load among the nodes. We further demonstrate the utility in scaling Gaussian processes to big data. We show that GP performance improves with increasing amounts of data in regression (on flight data with 2 million records) and latent variable modelling (on MNIST). The results show that GPs perform better than many common models often used for big data.Comment: 9 pages, 8 figure
    corecore