40 research outputs found
Massively parallel approximate Gaussian process regression
We explore how the big-three computing paradigms -- symmetric multi-processor
(SMC), graphical processing units (GPUs), and cluster computing -- can together
be brought to bare on large-data Gaussian processes (GP) regression problems
via a careful implementation of a newly developed local approximation scheme.
Our methodological contribution focuses primarily on GPU computation, as this
requires the most care and also provides the largest performance boost.
However, in our empirical work we study the relative merits of all three
paradigms to determine how best to combine them. The paper concludes with two
case studies. One is a real data fluid-dynamics computer experiment which
benefits from the local nature of our approximation; the second is a synthetic
data example designed to find the largest design for which (accurate) GP
emulation can performed on a commensurate predictive set under an hour.Comment: 24 pages, 6 figures, 1 tabl
Speeding up neighborhood search in local Gaussian process prediction
Recent implementations of local approximate Gaussian process models have
pushed computational boundaries for non-linear, non-parametric prediction
problems, particularly when deployed as emulators for computer experiments.
Their flavor of spatially independent computation accommodates massive
parallelization, meaning that they can handle designs two or more orders of
magnitude larger than previously. However, accomplishing that feat can still
require massive supercomputing resources. Here we aim to ease that burden. We
study how predictive variance is reduced as local designs are built up for
prediction. We then observe how the exhaustive and discrete nature of an
important search subroutine involved in building such local designs may be
overly conservative. Rather, we suggest that searching the space radially,
i.e., continuously along rays emanating from the predictive location of
interest, is a far thriftier alternative. Our empirical work demonstrates that
ray-based search yields predictors with accuracy comparable to exhaustive
search, but in a fraction of the time - bringing a supercomputer implementation
back onto the desktop.Comment: 24 pages, 5 figures, 4 table
Quantifying uncertainties on excursion sets under a Gaussian random field prior
We focus on the problem of estimating and quantifying uncertainties on the
excursion set of a function under a limited evaluation budget. We adopt a
Bayesian approach where the objective function is assumed to be a realization
of a Gaussian random field. In this setting, the posterior distribution on the
objective function gives rise to a posterior distribution on excursion sets.
Several approaches exist to summarize the distribution of such sets based on
random closed set theory. While the recently proposed Vorob'ev approach
exploits analytical formulae, further notions of variability require Monte
Carlo estimators relying on Gaussian random field conditional simulations. In
the present work we propose a method to choose Monte Carlo simulation points
and obtain quasi-realizations of the conditional field at fine designs through
affine predictors. The points are chosen optimally in the sense that they
minimize the posterior expected distance in measure between the excursion set
and its reconstruction. The proposed method reduces the computational costs due
to Monte Carlo simulations and enables the computation of quasi-realizations on
fine designs in large dimensions. We apply this reconstruction approach to
obtain realizations of an excursion set on a fine grid which allow us to give a
new measure of uncertainty based on the distance transform of the excursion
set. Finally we present a safety engineering test case where the simulation
method is employed to compute a Monte Carlo estimate of a contour line
Sequential Design with Mutual Information for Computer Experiments (MICE): Emulation of a Tsunami Model
Computer simulators can be computationally intensive to run over a large
number of input values, as required for optimization and various uncertainty
quantification tasks. The standard paradigm for the design and analysis of
computer experiments is to employ Gaussian random fields to model computer
simulators. Gaussian process models are trained on input-output data obtained
from simulation runs at various input values. Following this approach, we
propose a sequential design algorithm, MICE (Mutual Information for Computer
Experiments), that adaptively selects the input values at which to run the
computer simulator, in order to maximize the expected information gain (mutual
information) over the input space. The superior computational efficiency of the
MICE algorithm compared to other algorithms is demonstrated by test functions,
and a tsunami simulator with overall gains of up to 20% in that case
An analytic comparison of regularization methods for Gaussian Processes
Gaussian Processes (GPs) are a popular approach to predict the output of a
parameterized experiment. They have many applications in the field of Computer
Experiments, in particular to perform sensitivity analysis, adaptive design of
experiments and global optimization. Nearly all of the applications of GPs
require the inversion of a covariance matrix that, in practice, is often
ill-conditioned. Regularization methodologies are then employed with
consequences on the GPs that need to be better understood.The two principal
methods to deal with ill-conditioned covariance matrices are i) pseudoinverse
and ii) adding a positive constant to the diagonal (the so-called nugget
regularization).The first part of this paper provides an algebraic comparison
of PI and nugget regularizations. Redundant points, responsible for covariance
matrix singularity, are defined. It is proven that pseudoinverse
regularization, contrarily to nugget regularization, averages the output values
and makes the variance zero at redundant points. However, pseudoinverse and
nugget regularizations become equivalent as the nugget value vanishes. A
measure for data-model discrepancy is proposed which serves for choosing a
regularization technique.In the second part of the paper, a distribution-wise
GP is introduced that interpolates Gaussian distributions instead of data
points. Distribution-wise GP can be seen as an improved regularization method
for GPs