59,135 research outputs found

    Large-scale Heteroscedastic Regression via Gaussian Process

    Full text link
    Heteroscedastic regression considering the varying noises among observations has many applications in the fields like machine learning and statistics. Here we focus on the heteroscedastic Gaussian process (HGP) regression which integrates the latent function and the noise function together in a unified non-parametric Bayesian framework. Though showing remarkable performance, HGP suffers from the cubic time complexity, which strictly limits its application to big data. To improve the scalability, we first develop a variational sparse inference algorithm, named VSHGP, to handle large-scale datasets. Furthermore, two variants are developed to improve the scalability and capability of VSHGP. The first is stochastic VSHGP (SVSHGP) which derives a factorized evidence lower bound, thus enhancing efficient stochastic variational inference. The second is distributed VSHGP (DVSHGP) which (i) follows the Bayesian committee machine formalism to distribute computations over multiple local VSHGP experts with many inducing points; and (ii) adopts hybrid parameters for experts to guard against over-fitting and capture local variety. The superiority of DVSHGP and SVSHGP as compared to existing scalable heteroscedastic/homoscedastic GPs is then extensively verified on various datasets.Comment: 14 pages, 15 figure

    Understanding and Comparing Scalable Gaussian Process Regression for Big Data

    Full text link
    As a non-parametric Bayesian model which produces informative predictive distribution, Gaussian process (GP) has been widely used in various fields, like regression, classification and optimization. The cubic complexity of standard GP however leads to poor scalability, which poses challenges in the era of big data. Hence, various scalable GPs have been developed in the literature in order to improve the scalability while retaining desirable prediction accuracy. This paper devotes to investigating the methodological characteristics and performance of representative global and local scalable GPs including sparse approximations and local aggregations from four main perspectives: scalability, capability, controllability and robustness. The numerical experiments on two toy examples and five real-world datasets with up to 250K points offer the following findings. In terms of scalability, most of the scalable GPs own a time complexity that is linear to the training size. In terms of capability, the sparse approximations capture the long-term spatial correlations, the local aggregations capture the local patterns but suffer from over-fitting in some scenarios. In terms of controllability, we could improve the performance of sparse approximations by simply increasing the inducing size. But this is not the case for local aggregations. In terms of robustness, local aggregations are robust to various initializations of hyperparameters due to the local attention mechanism. Finally, we highlight that the proper hybrid of global and local scalable GPs may be a promising way to improve both the model capability and scalability for big data.Comment: 25 pages, 15 figures, preprint submitted to KB

    String and Membrane Gaussian Processes

    Full text link
    In this paper we introduce a novel framework for making exact nonparametric Bayesian inference on latent functions, that is particularly suitable for Big Data tasks. Firstly, we introduce a class of stochastic processes we refer to as string Gaussian processes (string GPs), which are not to be mistaken for Gaussian processes operating on text. We construct string GPs so that their finite-dimensional marginals exhibit suitable local conditional independence structures, which allow for scalable, distributed, and flexible nonparametric Bayesian inference, without resorting to approximations, and while ensuring some mild global regularity constraints. Furthermore, string GP priors naturally cope with heterogeneous input data, and the gradient of the learned latent function is readily available for explanatory analysis. Secondly, we provide some theoretical results relating our approach to the standard GP paradigm. In particular, we prove that some string GPs are Gaussian processes, which provides a complementary global perspective on our framework. Finally, we derive a scalable and distributed MCMC scheme for supervised learning tasks under string GP priors. The proposed MCMC scheme has computational time complexity O(N)\mathcal{O}(N) and memory requirement O(dN)\mathcal{O}(dN), where NN is the data size and dd the dimension of the input space. We illustrate the efficacy of the proposed approach on several synthetic and real-world datasets, including a dataset with 66 millions input points and 88 attributes.Comment: To appear in the Journal of Machine Learning Research (JMLR), Volume 1
    • …
    corecore