699 research outputs found
Thoughts on Massively Scalable Gaussian Processes
We introduce a framework and early results for massively scalable Gaussian
processes (MSGP), significantly extending the KISS-GP approach of Wilson and
Nickisch (2015). The MSGP framework enables the use of Gaussian processes (GPs)
on billions of datapoints, without requiring distributed inference, or severe
assumptions. In particular, MSGP reduces the standard complexity of GP
learning and inference to , and the standard complexity per test
point prediction to . MSGP involves 1) decomposing covariance matrices as
Kronecker products of Toeplitz matrices approximated by circulant matrices.
This multi-level circulant approximation allows one to unify the orthogonal
computational benefits of fast Kronecker and Toeplitz approaches, and is
significantly faster than either approach in isolation; 2) local kernel
interpolation and inducing points to allow for arbitrarily located data inputs,
and test time predictions; 3) exploiting block-Toeplitz Toeplitz-block
structure (BTTB), which enables fast inference and learning when
multidimensional Kronecker structure is not present; and 4) projections of the
input space to flexibly model correlated inputs and high dimensional data. The
ability to handle many () inducing points allows for near-exact
accuracy and large scale kernel learning.Comment: 25 pages, 9 figure
Deep Kernel Learning
We introduce scalable deep kernels, which combine the structural properties
of deep learning architectures with the non-parametric flexibility of kernel
methods. Specifically, we transform the inputs of a spectral mixture base
kernel with a deep architecture, using local kernel interpolation, inducing
points, and structure exploiting (Kronecker and Toeplitz) algebra for a
scalable kernel representation. These closed-form kernels can be used as
drop-in replacements for standard kernels, with benefits in expressive power
and scalability. We jointly learn the properties of these kernels through the
marginal likelihood of a Gaussian process. Inference and learning cost
for training points, and predictions cost per test point. On a large
and diverse collection of applications, including a dataset with 2 million
examples, we show improved performance over scalable Gaussian processes with
flexible kernel learning models, and stand-alone deep architectures.Comment: 19 pages, 6 figure
Algorithmic Linearly Constrained Gaussian Processes
We algorithmically construct multi-output Gaussian process priors which
satisfy linear differential equations. Our approach attempts to parametrize all
solutions of the equations using Gr\"obner bases. If successful, a push forward
Gaussian process along the paramerization is the desired prior. We consider
several examples from physics, geomathematics and control, among them the full
inhomogeneous system of Maxwell's equations. By bringing together stochastic
learning and computer algebra in a novel way, we combine noisy observations
with precise algebraic computations.Comment: NIPS 201
Constant-Time Predictive Distributions for Gaussian Processes
One of the most compelling features of Gaussian process (GP) regression is
its ability to provide well-calibrated posterior distributions. Recent advances
in inducing point methods have sped up GP marginal likelihood and posterior
mean computations, leaving posterior covariance estimation and sampling as the
remaining computational bottlenecks. In this paper we address these
shortcomings by using the Lanczos algorithm to rapidly approximate the
predictive covariance matrix. Our approach, which we refer to as LOVE (LanczOs
Variance Estimates), substantially improves time and space complexity. In our
experiments, LOVE computes covariances up to 2,000 times faster and draws
samples 18,000 times faster than existing methods, all without sacrificing
accuracy.Comment: ICML 201
Scalable Gaussian Processes for Predicting the Properties of Inorganic Glasses with Large Datasets
Gaussian process regression (GPR) is a useful technique to predict
composition--property relationships in glasses as the method inherently
provides the standard deviation of the predictions. However, the technique
remains restricted to small datasets due to the substantial computational cost
associated with it. Here, using a scalable GPR algorithm, namely, kernel
interpolation for scalable structured Gaussian processes (KISS-GP) along with
massively scalable GP (MSGP), we develop composition--property models for
inorganic glasses based on a large dataset with more than 100,000 glass
compositions, 37 components, and nine important properties, namely, density,
Young's, shear, and bulk moduli, thermal expansion coefficient, Vickers'
hardness, refractive index, glass transition temperature, and liquidus
temperature. Finally, to accelerate glass design, the models developed here are
shared publicly as part of a package, namely, Python for Glass Genomics
(PyGGi)
When Gaussian Process Meets Big Data: A Review of Scalable GPs
The vast quantity of information brought by big data as well as the evolving
computer hardware encourages success stories in the machine learning community.
In the meanwhile, it poses challenges for the Gaussian process (GP) regression,
a well-known non-parametric and interpretable Bayesian model, which suffers
from cubic complexity to data size. To improve the scalability while retaining
desirable prediction quality, a variety of scalable GPs have been presented.
But they have not yet been comprehensively reviewed and analyzed in order to be
well understood by both academia and industry. The review of scalable GPs in
the GP community is timely and important due to the explosion of data size. To
this end, this paper is devoted to the review on state-of-the-art scalable GPs
involving two main categories: global approximations which distillate the
entire data and local approximations which divide the data for subspace
learning. Particularly, for global approximations, we mainly focus on sparse
approximations comprising prior approximations which modify the prior but
perform exact inference, posterior approximations which retain exact prior but
perform approximate inference, and structured sparse approximations which
exploit specific structures in kernel matrix; for local approximations, we
highlight the mixture/product of experts that conducts model averaging from
multiple local experts to boost predictions. To present a complete review,
recent advances for improving the scalability and capability of scalable GPs
are reviewed. Finally, the extensions and open issues regarding the
implementation of scalable GPs in various scenarios are reviewed and discussed
to inspire novel ideas for future research avenues.Comment: 20 pages, 6 figure
Polya Urn Latent Dirichlet Allocation: a doubly sparse massively parallel sampler
Latent Dirichlet Allocation (LDA) is a topic model widely used in natural
language processing and machine learning. Most approaches to training the model
rely on iterative algorithms, which makes it difficult to run LDA on big
corpora that are best analyzed in parallel and distributed computational
environments. Indeed, current approaches to parallel inference either don't
converge to the correct posterior or require storage of large dense matrices in
memory. We present a novel sampler that overcomes both problems, and we show
that this sampler is faster, both empirically and theoretically, than previous
Gibbs samplers for LDA. We do so by employing a novel P\'olya-urn-based
approximation in the sparse partially collapsed sampler for LDA. We prove that
the approximation error vanishes with data size, making our algorithm
asymptotically exact, a property of importance for large-scale topic models. In
addition, we show, via an explicit example, that -- contrary to popular belief
in the topic modeling literature -- partially collapsed samplers can be more
efficient than fully collapsed samplers. We conclude by comparing the
performance of our algorithm with that of other approaches on well-known
corpora
Scaling up the Automatic Statistician: Scalable Structure Discovery using Gaussian Processes
Automating statistical modelling is a challenging problem in artificial
intelligence. The Automatic Statistician takes a first step in this direction,
by employing a kernel search algorithm with Gaussian Processes (GP) to provide
interpretable statistical models for regression problems. However this does not
scale due to its running time for the model selection. We propose
Scalable Kernel Composition (SKC), a scalable kernel search algorithm that
extends the Automatic Statistician to bigger data sets. In doing so, we derive
a cheap upper bound on the GP marginal likelihood that sandwiches the marginal
likelihood with the variational lower bound . We show that the upper bound is
significantly tighter than the lower bound and thus useful for model selection.Comment: AISTATS 2018 (oral
Lifelong Bayesian Optimization
Automatic Machine Learning (Auto-ML) systems tackle the problem of automating
the design of prediction models or pipelines for data science. In this paper,
we present Lifelong Bayesian Optimization (LBO), an online, multitask Bayesian
optimization (BO) algorithm designed to solve the problem of model selection
for datasets arriving and evolving over time. To be suitable for "lifelong"
Bayesian Optimization, an algorithm needs to scale with the ever increasing
number of acquisitions and should be able to leverage past optimizations in
learning the current best model. We cast the problem of model selection as a
black-box function optimization problem. In LBO, we exploit the correlation
between functions by using components of previously learned functions to speed
up the learning process for newly arriving datasets. Experiments on real and
synthetic data show that LBO outperforms standard BO algorithms applied
repeatedly on the data.Comment: 17 pages, 8 figure
GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration
Despite advances in scalable models, the inference tools used for Gaussian
processes (GPs) have yet to fully capitalize on developments in computing
hardware. We present an efficient and general approach to GP inference based on
Blackbox Matrix-Matrix multiplication (BBMM). BBMM inference uses a modified
batched version of the conjugate gradients algorithm to derive all terms for
training and inference in a single call. BBMM reduces the asymptotic complexity
of exact GP inference from to . Adapting this algorithm to
scalable approximations and complex GP models simply requires a routine for
efficient matrix-matrix multiplication with the kernel and its derivative. In
addition, BBMM uses a specialized preconditioner to substantially speed up
convergence. In experiments we show that BBMM effectively uses GPU hardware to
dramatically accelerate both exact GP inference and scalable approximations.
Additionally, we provide GPyTorch, a software platform for scalable GP
inference via BBMM, built on PyTorch.Comment: NeurIPS 201
- …