14,356 research outputs found
Know Your Boundaries: Constraining Gaussian Processes by Variational Harmonic Features
Gaussian processes (GPs) provide a powerful framework for extrapolation,
interpolation, and noise removal in regression and classification. This paper
considers constraining GPs to arbitrarily-shaped domains with boundary
conditions. We solve a Fourier-like generalised harmonic feature representation
of the GP prior in the domain of interest, which both constrains the GP and
attains a low-rank representation that is used for speeding up inference. The
method scales as in prediction and in
hyperparameter learning for regression, where is the number of data points
and the number of features. Furthermore, we make use of the variational
approach to allow the method to deal with non-Gaussian likelihoods. The
experiments cover both simulated and empirical data in which the boundary
conditions allow for inclusion of additional physical information.Comment: Appearing in Proceedings of AISTATS 201
Large-Scale Cox Process Inference using Variational Fourier Features
Gaussian process modulated Poisson processes provide a flexible framework for
modelling spatiotemporal point patterns. So far this had been restricted to one
dimension, binning to a pre-determined grid, or small data sets of up to a few
thousand data points. Here we introduce Cox process inference based on Fourier
features. This sparse representation induces global rather than local
constraints on the function space and is computationally efficient. This allows
us to formulate a grid-free approximation that scales well with the number of
data points and the size of the domain. We demonstrate that this allows MCMC
approximations to the non-Gaussian posterior. We also find that, in practice,
Fourier features have more consistent optimization behavior than previous
approaches. Our approximate Bayesian method can fit over 100,000 events with
complex spatiotemporal patterns in three dimensions on a single GPU
Remote Sensing Image Classification with Large Scale Gaussian Processes
Current remote sensing image classification problems have to deal with an
unprecedented amount of heterogeneous and complex data sources. Upcoming
missions will soon provide large data streams that will make land cover/use
classification difficult. Machine learning classifiers can help at this, and
many methods are currently available. A popular kernel classifier is the
Gaussian process classifier (GPC), since it approaches the classification
problem with a solid probabilistic treatment, thus yielding confidence
intervals for the predictions as well as very competitive results to
state-of-the-art neural networks and support vector machines. However, its
computational cost is prohibitive for large scale applications, and constitutes
the main obstacle precluding wide adoption. This paper tackles this problem by
introducing two novel efficient methodologies for Gaussian Process (GP)
classification. We first include the standard random Fourier features
approximation into GPC, which largely decreases its computational cost and
permits large scale remote sensing image classification. In addition, we
propose a model which avoids randomly sampling a number of Fourier frequencies,
and alternatively learns the optimal ones within a variational Bayes approach.
The performance of the proposed methods is illustrated in complex problems of
cloud detection from multispectral imagery and infrared sounding data.
Excellent empirical results support the proposal in both computational cost and
accuracy.Comment: 11 pages, 6 figures, Accepted for publication in IEEE Transactions on
Geoscience and Remote Sensing; added the IEEE copyright statemen
Walsh-Hadamard Variational Inference for Bayesian Deep Learning
Over-parameterized models, such as DeepNets and ConvNets, form a class of
models that are routinely adopted in a wide variety of applications, and for
which Bayesian inference is desirable but extremely challenging. Variational
inference offers the tools to tackle this challenge in a scalable way and with
some degree of flexibility on the approximation, but for over-parameterized
models this is challenging due to the over-regularization property of the
variational objective. Inspired by the literature on kernel methods, and in
particular on structured approximations of distributions of random matrices,
this paper proposes Walsh-Hadamard Variational Inference (WHVI), which uses
Walsh-Hadamard-based factorization strategies to reduce the parameterization
and accelerate computations, thus avoiding over-regularization issues with the
variational objective. Extensive theoretical and empirical analyses demonstrate
that WHVI yields considerable speedups and model reductions compared to other
techniques to carry out approximate inference for over-parameterized models,
and ultimately show how advances in kernel methods can be translated into
advances in approximate Bayesian inference
Sparse Gaussian Processes with Spherical Harmonic Features
We introduce a new class of inter-domain variational Gaussian processes (GP)
where data is mapped onto the unit hypersphere in order to use spherical
harmonic representations. Our inference scheme is comparable to variational
Fourier features, but it does not suffer from the curse of dimensionality, and
leads to diagonal covariance matrices between inducing variables. This enables
a speed-up in inference, because it bypasses the need to invert large
covariance matrices. Our experiments show that our model is able to fit a
regression model for a dataset with 6 million entries two orders of magnitude
faster compared to standard sparse GPs, while retaining state of the art
accuracy. We also demonstrate competitive performance on classification with
non-conjugate likelihoods.Comment: International Conference on Machine, PMLR 119, 202
Scalable Training of Inference Networks for Gaussian-Process Models
Inference in Gaussian process (GP) models is computationally challenging for
large data, and often difficult to approximate with a small number of inducing
points. We explore an alternative approximation that employs stochastic
inference networks for a flexible inference. Unfortunately, for such networks,
minibatch training is difficult to be able to learn meaningful correlations
over function outputs for a large dataset. We propose an algorithm that enables
such training by tracking a stochastic, functional mirror-descent algorithm. At
each iteration, this only requires considering a finite number of input
locations, resulting in a scalable and easy-to-implement algorithm. Empirical
results show comparable and, sometimes, superior performance to existing sparse
variational GP methods.Comment: ICML 2019. Update results added in the camera-ready versio
Variational description of statistical field theories using Daubechies' wavelets
We investigate the description of statistical field theories using
Daubechies' orthonormal compact wavelets on a lattice. A simple variational
approach is used to extend mean field theory and make predictions for the
fluctuation strengths of wavelet coefficients and thus for the correlation
function. The results are compared to Monte Carlo simulations. We find that
wavelets provide a reasonable description of critical phenomena with only a
small number of variational parameters. This lets us hope for an implementation
of the renormalization group in wavelet space.Comment: 21pp, LaTeX with Postscript figure
Scaling up the Automatic Statistician: Scalable Structure Discovery using Gaussian Processes
Automating statistical modelling is a challenging problem in artificial
intelligence. The Automatic Statistician takes a first step in this direction,
by employing a kernel search algorithm with Gaussian Processes (GP) to provide
interpretable statistical models for regression problems. However this does not
scale due to its running time for the model selection. We propose
Scalable Kernel Composition (SKC), a scalable kernel search algorithm that
extends the Automatic Statistician to bigger data sets. In doing so, we derive
a cheap upper bound on the GP marginal likelihood that sandwiches the marginal
likelihood with the variational lower bound . We show that the upper bound is
significantly tighter than the lower bound and thus useful for model selection.Comment: AISTATS 2018 (oral
Constant-Time Predictive Distributions for Gaussian Processes
One of the most compelling features of Gaussian process (GP) regression is
its ability to provide well-calibrated posterior distributions. Recent advances
in inducing point methods have sped up GP marginal likelihood and posterior
mean computations, leaving posterior covariance estimation and sampling as the
remaining computational bottlenecks. In this paper we address these
shortcomings by using the Lanczos algorithm to rapidly approximate the
predictive covariance matrix. Our approach, which we refer to as LOVE (LanczOs
Variance Estimates), substantially improves time and space complexity. In our
experiments, LOVE computes covariances up to 2,000 times faster and draws
samples 18,000 times faster than existing methods, all without sacrificing
accuracy.Comment: ICML 201
Efficient Learning of Harmonic Priors for Pitch Detection in Polyphonic Music
Automatic music transcription (AMT) aims to infer a latent symbolic
representation of a piece of music (piano-roll), given a corresponding observed
audio recording. Transcribing polyphonic music (when multiple notes are played
simultaneously) is a challenging problem, due to highly structured overlapping
between harmonics. We study whether the introduction of physically inspired
Gaussian process (GP) priors into audio content analysis models improves the
extraction of patterns required for AMT. Audio signals are described as a
linear combination of sources. Each source is decomposed into the product of an
amplitude-envelope, and a quasi-periodic component process. We introduce the
Mat\'ern spectral mixture (MSM) kernel for describing frequency content of
singles notes. We consider two different regression approaches. In the sigmoid
model every pitch-activation is independently non-linear transformed. In the
softmax model several activation GPs are jointly non-linearly transformed. This
introduce cross-correlation between activations. We use variational Bayes for
approximate inference. We empirically evaluate how these models work in
practice transcribing polyphonic music. We demonstrate that rather than
encourage dependency between activations, what is relevant for improving pitch
detection is to learnt priors that fit the frequency content of the sound
events to detect.Comment: Updated version with appendix section about derivation of amplitude
modulated G
- …