3,044 research outputs found

    Nonparametric Bayesian Mixed-effect Model: a Sparse Gaussian Process Approach

    Full text link
    Multi-task learning models using Gaussian processes (GP) have been developed and successfully applied in various applications. The main difficulty with this approach is the computational cost of inference using the union of examples from all tasks. Therefore sparse solutions, that avoid using the entire data directly and instead use a set of informative "representatives" are desirable. The paper investigates this problem for the grouped mixed-effect GP model where each individual response is given by a fixed-effect, taken from one of a set of unknown groups, plus a random individual effect function that captures variations among individuals. Such models have been widely used in previous work but no sparse solutions have been developed. The paper presents the first sparse solution for such problems, showing how the sparse approximation can be obtained by maximizing a variational lower bound on the marginal likelihood, generalizing ideas from single-task Gaussian processes to handle the mixed-effect model as well as grouping. Experiments using artificial and real data validate the approach showing that it can recover the performance of inference with the full sample, that it outperforms baseline methods, and that it outperforms state of the art sparse solutions for other multi-task GP formulations.Comment: Preliminary version appeared in ECML201

    Doubly Stochastic Variational Inference for Deep Gaussian Processes

    Get PDF
    Gaussian processes (GPs) are a good choice for function approximation as they are flexible, robust to over-fitting, and provide well-calibrated predictive uncertainty. Deep Gaussian processes (DGPs) are multi-layer generalisations of GPs, but inference in these models has proved challenging. Existing approaches to inference in DGP models assume approximate posteriors that force independence between the layers, and do not work well in practice. We present a doubly stochastic variational inference algorithm, which does not force independence between layers. With our method of inference we demonstrate that a DGP model can be used effectively on data ranging in size from hundreds to a billion points. We provide strong empirical evidence that our inference scheme for DGPs works well in practice in both classification and regression.Comment: NIPS 201

    Large-scale Heteroscedastic Regression via Gaussian Process

    Full text link
    Heteroscedastic regression considering the varying noises among observations has many applications in the fields like machine learning and statistics. Here we focus on the heteroscedastic Gaussian process (HGP) regression which integrates the latent function and the noise function together in a unified non-parametric Bayesian framework. Though showing remarkable performance, HGP suffers from the cubic time complexity, which strictly limits its application to big data. To improve the scalability, we first develop a variational sparse inference algorithm, named VSHGP, to handle large-scale datasets. Furthermore, two variants are developed to improve the scalability and capability of VSHGP. The first is stochastic VSHGP (SVSHGP) which derives a factorized evidence lower bound, thus enhancing efficient stochastic variational inference. The second is distributed VSHGP (DVSHGP) which (i) follows the Bayesian committee machine formalism to distribute computations over multiple local VSHGP experts with many inducing points; and (ii) adopts hybrid parameters for experts to guard against over-fitting and capture local variety. The superiority of DVSHGP and SVSHGP as compared to existing scalable heteroscedastic/homoscedastic GPs is then extensively verified on various datasets.Comment: 14 pages, 15 figure

    Inference for determinantal point processes without spectral knowledge

    Full text link
    Determinantal point processes (DPPs) are point process models that naturally encode diversity between the points of a given realization, through a positive definite kernel KK. DPPs possess desirable properties, such as exact sampling or analyticity of the moments, but learning the parameters of kernel KK through likelihood-based inference is not straightforward. First, the kernel that appears in the likelihood is not KK, but another kernel LL related to KK through an often intractable spectral decomposition. This issue is typically bypassed in machine learning by directly parametrizing the kernel LL, at the price of some interpretability of the model parameters. We follow this approach here. Second, the likelihood has an intractable normalizing constant, which takes the form of a large determinant in the case of a DPP over a finite set of objects, and the form of a Fredholm determinant in the case of a DPP over a continuous domain. Our main contribution is to derive bounds on the likelihood of a DPP, both for finite and continuous domains. Unlike previous work, our bounds are cheap to evaluate since they do not rely on approximating the spectrum of a large matrix or an operator. Through usual arguments, these bounds thus yield cheap variational inference and moderately expensive exact Markov chain Monte Carlo inference methods for DPPs

    Understanding and Comparing Scalable Gaussian Process Regression for Big Data

    Full text link
    As a non-parametric Bayesian model which produces informative predictive distribution, Gaussian process (GP) has been widely used in various fields, like regression, classification and optimization. The cubic complexity of standard GP however leads to poor scalability, which poses challenges in the era of big data. Hence, various scalable GPs have been developed in the literature in order to improve the scalability while retaining desirable prediction accuracy. This paper devotes to investigating the methodological characteristics and performance of representative global and local scalable GPs including sparse approximations and local aggregations from four main perspectives: scalability, capability, controllability and robustness. The numerical experiments on two toy examples and five real-world datasets with up to 250K points offer the following findings. In terms of scalability, most of the scalable GPs own a time complexity that is linear to the training size. In terms of capability, the sparse approximations capture the long-term spatial correlations, the local aggregations capture the local patterns but suffer from over-fitting in some scenarios. In terms of controllability, we could improve the performance of sparse approximations by simply increasing the inducing size. But this is not the case for local aggregations. In terms of robustness, local aggregations are robust to various initializations of hyperparameters due to the local attention mechanism. Finally, we highlight that the proper hybrid of global and local scalable GPs may be a promising way to improve both the model capability and scalability for big data.Comment: 25 pages, 15 figures, preprint submitted to KB
    corecore