11,581 research outputs found
Doubly Stochastic Variational Inference for Deep Gaussian Processes
Gaussian processes (GPs) are a good choice for function approximation as they
are flexible, robust to over-fitting, and provide well-calibrated predictive
uncertainty. Deep Gaussian processes (DGPs) are multi-layer generalisations of
GPs, but inference in these models has proved challenging. Existing approaches
to inference in DGP models assume approximate posteriors that force
independence between the layers, and do not work well in practice. We present a
doubly stochastic variational inference algorithm, which does not force
independence between layers. With our method of inference we demonstrate that a
DGP model can be used effectively on data ranging in size from hundreds to a
billion points. We provide strong empirical evidence that our inference scheme
for DGPs works well in practice in both classification and regression.Comment: NIPS 201
Orthogonally Decoupled Variational Gaussian Processes
Gaussian processes (GPs) provide a powerful non-parametric framework for
reasoning over functions. Despite appealing theory, its superlinear
computational and memory complexities have presented a long-standing challenge.
State-of-the-art sparse variational inference methods trade modeling accuracy
against complexity. However, the complexities of these methods still scale
superlinearly in the number of basis functions, implying that that sparse GP
methods are able to learn from large datasets only when a small model is used.
Recently, a decoupled approach was proposed that removes the unnecessary
coupling between the complexities of modeling the mean and the covariance
functions of a GP. It achieves a linear complexity in the number of mean
parameters, so an expressive posterior mean function can be modeled. While
promising, this approach suffers from optimization difficulties due to
ill-conditioning and non-convexity. In this work, we propose an alternative
decoupled parametrization. It adopts an orthogonal basis in the mean function
to model the residues that cannot be learned by the standard coupled approach.
Therefore, our method extends, rather than replaces, the coupled approach to
achieve strictly better performance. This construction admits a straightforward
natural gradient update rule, so the structure of the information manifold that
is lost during decoupling can be leveraged to speed up learning. Empirically,
our algorithm demonstrates significantly faster convergence in multiple
experiments.Comment: Appearing NIPS 201
Practical Bayesian Modeling and Inference for Massive Spatial Datasets On Modest Computing Environments
With continued advances in Geographic Information Systems and related
computational technologies, statisticians are often required to analyze very
large spatial datasets. This has generated substantial interest over the last
decade, already too vast to be summarized here, in scalable methodologies for
analyzing large spatial datasets. Scalable spatial process models have been
found especially attractive due to their richness and flexibility and,
particularly so in the Bayesian paradigm, due to their presence in hierarchical
model settings. However, the vast majority of research articles present in this
domain have been geared toward innovative theory or more complex model
development. Very limited attention has been accorded to approaches for easily
implementable scalable hierarchical models for the practicing scientist or
spatial analyst. This article is submitted to the Practice section of the
journal with the aim of developing massively scalable Bayesian approaches that
can rapidly deliver Bayesian inference on spatial process that are practically
indistinguishable from inference obtained using more expensive alternatives. A
key emphasis is on implementation within very standard (modest) computing
environments (e.g., a standard desktop or laptop) using easily available
statistical software packages without requiring message-parsing interfaces or
parallel programming paradigms. Key insights are offered regarding assumptions
and approximations concerning practical efficiency.Comment: 20 pages, 4 figures, 2 table
Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP)
We introduce a new structured kernel interpolation (SKI) framework, which
generalises and unifies inducing point methods for scalable Gaussian processes
(GPs). SKI methods produce kernel approximations for fast computations through
kernel interpolation. The SKI framework clarifies how the quality of an
inducing point approach depends on the number of inducing (aka interpolation)
points, interpolation strategy, and GP covariance kernel. SKI also provides a
mechanism to create new scalable kernel methods, through choosing different
kernel interpolation strategies. Using SKI, with local cubic kernel
interpolation, we introduce KISS-GP, which is 1) more scalable than inducing
point alternatives, 2) naturally enables Kronecker and Toeplitz algebra for
substantial additional gains in scalability, without requiring any grid data,
and 3) can be used for fast and expressive kernel learning. KISS-GP costs O(n)
time and storage for GP inference. We evaluate KISS-GP for kernel matrix
approximation, kernel learning, and natural sound modelling.Comment: 19 pages, 4 figure
- …