11,581 research outputs found

    Doubly Stochastic Variational Inference for Deep Gaussian Processes

    Get PDF
    Gaussian processes (GPs) are a good choice for function approximation as they are flexible, robust to over-fitting, and provide well-calibrated predictive uncertainty. Deep Gaussian processes (DGPs) are multi-layer generalisations of GPs, but inference in these models has proved challenging. Existing approaches to inference in DGP models assume approximate posteriors that force independence between the layers, and do not work well in practice. We present a doubly stochastic variational inference algorithm, which does not force independence between layers. With our method of inference we demonstrate that a DGP model can be used effectively on data ranging in size from hundreds to a billion points. We provide strong empirical evidence that our inference scheme for DGPs works well in practice in both classification and regression.Comment: NIPS 201

    Orthogonally Decoupled Variational Gaussian Processes

    Get PDF
    Gaussian processes (GPs) provide a powerful non-parametric framework for reasoning over functions. Despite appealing theory, its superlinear computational and memory complexities have presented a long-standing challenge. State-of-the-art sparse variational inference methods trade modeling accuracy against complexity. However, the complexities of these methods still scale superlinearly in the number of basis functions, implying that that sparse GP methods are able to learn from large datasets only when a small model is used. Recently, a decoupled approach was proposed that removes the unnecessary coupling between the complexities of modeling the mean and the covariance functions of a GP. It achieves a linear complexity in the number of mean parameters, so an expressive posterior mean function can be modeled. While promising, this approach suffers from optimization difficulties due to ill-conditioning and non-convexity. In this work, we propose an alternative decoupled parametrization. It adopts an orthogonal basis in the mean function to model the residues that cannot be learned by the standard coupled approach. Therefore, our method extends, rather than replaces, the coupled approach to achieve strictly better performance. This construction admits a straightforward natural gradient update rule, so the structure of the information manifold that is lost during decoupling can be leveraged to speed up learning. Empirically, our algorithm demonstrates significantly faster convergence in multiple experiments.Comment: Appearing NIPS 201

    Practical Bayesian Modeling and Inference for Massive Spatial Datasets On Modest Computing Environments

    Full text link
    With continued advances in Geographic Information Systems and related computational technologies, statisticians are often required to analyze very large spatial datasets. This has generated substantial interest over the last decade, already too vast to be summarized here, in scalable methodologies for analyzing large spatial datasets. Scalable spatial process models have been found especially attractive due to their richness and flexibility and, particularly so in the Bayesian paradigm, due to their presence in hierarchical model settings. However, the vast majority of research articles present in this domain have been geared toward innovative theory or more complex model development. Very limited attention has been accorded to approaches for easily implementable scalable hierarchical models for the practicing scientist or spatial analyst. This article is submitted to the Practice section of the journal with the aim of developing massively scalable Bayesian approaches that can rapidly deliver Bayesian inference on spatial process that are practically indistinguishable from inference obtained using more expensive alternatives. A key emphasis is on implementation within very standard (modest) computing environments (e.g., a standard desktop or laptop) using easily available statistical software packages without requiring message-parsing interfaces or parallel programming paradigms. Key insights are offered regarding assumptions and approximations concerning practical efficiency.Comment: 20 pages, 4 figures, 2 table

    Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP)

    Full text link
    We introduce a new structured kernel interpolation (SKI) framework, which generalises and unifies inducing point methods for scalable Gaussian processes (GPs). SKI methods produce kernel approximations for fast computations through kernel interpolation. The SKI framework clarifies how the quality of an inducing point approach depends on the number of inducing (aka interpolation) points, interpolation strategy, and GP covariance kernel. SKI also provides a mechanism to create new scalable kernel methods, through choosing different kernel interpolation strategies. Using SKI, with local cubic kernel interpolation, we introduce KISS-GP, which is 1) more scalable than inducing point alternatives, 2) naturally enables Kronecker and Toeplitz algebra for substantial additional gains in scalability, without requiring any grid data, and 3) can be used for fast and expressive kernel learning. KISS-GP costs O(n) time and storage for GP inference. We evaluate KISS-GP for kernel matrix approximation, kernel learning, and natural sound modelling.Comment: 19 pages, 4 figure
    • …
    corecore