75 research outputs found
Emulating dynamic non-linear simulators using Gaussian processes
The dynamic emulation of non-linear deterministic computer codes where the
output is a time series, possibly multivariate, is examined. Such computer
models simulate the evolution of some real-world phenomenon over time, for
example models of the climate or the functioning of the human brain. The models
we are interested in are highly non-linear and exhibit tipping points,
bifurcations and chaotic behaviour. However, each simulation run could be too
time-consuming to perform analyses that require many runs, including
quantifying the variation in model output with respect to changes in the
inputs. Therefore, Gaussian process emulators are used to approximate the
output of the code. To do this, the flow map of the system under study is
emulated over a short time period. Then, it is used in an iterative way to
predict the whole time series. A number of ways are proposed to take into
account the uncertainty of inputs to the emulators, after fixed initial
conditions, and the correlation between them through the time series. The
methodology is illustrated with two examples: the highly non-linear dynamical
systems described by the Lorenz and Van der Pol equations. In both cases, the
predictive performance is relatively high and the measure of uncertainty
provided by the method reflects the extent of predictability in each system
A generalized Gaussian process model for computer experiments with binary time series
Non-Gaussian observations such as binary responses are common in some
computer experiments. Motivated by the analysis of a class of cell adhesion
experiments, we introduce a generalized Gaussian process model for binary
responses, which shares some common features with standard GP models. In
addition, the proposed model incorporates a flexible mean function that can
capture different types of time series structures. Asymptotic properties of the
estimators are derived, and an optimal predictor as well as its predictive
distribution are constructed. Their performance is examined via two simulation
studies. The methodology is applied to study computer simulations for cell
adhesion experiments. The fitted model reveals important biological information
in repeated cell bindings, which is not directly observable in lab experiments.Comment: 49 pages, 4 figure
Statistical Methods for Large Spatial and Spatio-temporal Datasets
Classical statistical models encounter the computational bottleneck for large spatial/spatio-temporal datasets. This dissertation contains three articles describing computationally efficient approximation methods for applying Gaussian process models to large spatial and spatio-temporal datasets. The first article extends the FSA-Block approach in [60] in the sense of preserving more information of the residual covariance matrix. By using a block conditional likelihood approximation to the residual likelihood, the residual covariance of neighboring data blocks can be preserved, which relaxes the conditional independence assumption of the FSA-Block approach. We show that the approximated likelihood by the proposed method is Gaussian with an explicit form of covariance matrix, and the computational complexity is linear with sample size n. We also show that the proposed method can result in a valid Gaussian process so that both the parameter estimation and prediction are consistent in the same model framework. Since neighborhood information are incorporated in approximating the residual covariance function, simulation studies show that the proposed method can further alleviate the mismatch problems in predicting responses on block boundary locations. The second article is the spatio-temporal extension of the FSA-Block approach, where we model the space-time responses as realizations from a Gaussian process model of spatio-temporal covariance functions. Since the knot number and locations are crucial to the model performance, a reversible jump Markov chain Monte Carlo (RJMCMC) algorithm is proposed to select knots automatically from a discrete set of spatio-temporal points for the proposed method. We show that the proposed knot selection algorithm can result in more robust prediction results. Then the proposed method is compared with weighted composite likelihood method through simulation studies and an ozone dataset. The third article applies the nonseparable auto-covariance function to model the computer code outputs. It proposes a multi-output Gaussian process emulator with a nonseparable auto-covariance function to avoid limitations of using separable emulators. To facilitate the computation of nonseparable emulator, we introduce the FSA-Block approach to approximate the proposed model. Then we compare the proposed method with Gaussian process emulator with separable covariance models through simulated examples and a real computer code
Statistical Methods for Large Spatial and Spatio-temporal Datasets
Classical statistical models encounter the computational bottleneck for large spatial/spatio-temporal datasets. This dissertation contains three articles describing computationally efficient approximation methods for applying Gaussian process models to large spatial and spatio-temporal datasets. The first article extends the FSA-Block approach in [60] in the sense of preserving more information of the residual covariance matrix. By using a block conditional likelihood approximation to the residual likelihood, the residual covariance of neighboring data blocks can be preserved, which relaxes the conditional independence assumption of the FSA-Block approach. We show that the approximated likelihood by the proposed method is Gaussian with an explicit form of covariance matrix, and the computational complexity is linear with sample size n. We also show that the proposed method can result in a valid Gaussian process so that both the parameter estimation and prediction are consistent in the same model framework. Since neighborhood information are incorporated in approximating the residual covariance function, simulation studies show that the proposed method can further alleviate the mismatch problems in predicting responses on block boundary locations. The second article is the spatio-temporal extension of the FSA-Block approach, where we model the space-time responses as realizations from a Gaussian process model of spatio-temporal covariance functions. Since the knot number and locations are crucial to the model performance, a reversible jump Markov chain Monte Carlo (RJMCMC) algorithm is proposed to select knots automatically from a discrete set of spatio-temporal points for the proposed method. We show that the proposed knot selection algorithm can result in more robust prediction results. Then the proposed method is compared with weighted composite likelihood method through simulation studies and an ozone dataset. The third article applies the nonseparable auto-covariance function to model the computer code outputs. It proposes a multi-output Gaussian process emulator with a nonseparable auto-covariance function to avoid limitations of using separable emulators. To facilitate the computation of nonseparable emulator, we introduce the FSA-Block approach to approximate the proposed model. Then we compare the proposed method with Gaussian process emulator with separable covariance models through simulated examples and a real computer code
Multivariate emulation of computer simulators: model selection and diagnostics with application to a humanitarian relief model
We present a common framework for Bayesian emulation methodologies for multivariate-output simulators, or computer models, that employ either parametric linear models or nonparametric Gaussian processes. Novel diagnostics suitable for multivariate covariance-separable emulators are developed and techniques to improve the adequacy of an emulator are discussed and implemented. A variety of emulators are compared for a humanitarian relief simulator, modelling aid missions to Sicily after a volcanic eruption and earthquake, and a sensitivity analysis is conducted to determine the sensitivity of the simulator output to changes in the input variables. The results from parametric and nonparametric emulators are compared in terms of prediction accuracy, uncertainty quantification and scientific interpretability
Compression and Conditional Emulation of Climate Model Output
Numerical climate model simulations run at high spatial and temporal
resolutions generate massive quantities of data. As our computing capabilities
continue to increase, storing all of the data is not sustainable, and thus it
is important to develop methods for representing the full datasets by smaller
compressed versions. We propose a statistical compression and decompression
algorithm based on storing a set of summary statistics as well as a statistical
model describing the conditional distribution of the full dataset given the
summary statistics. The statistical model can be used to generate realizations
representing the full dataset, along with characterizations of the
uncertainties in the generated data. Thus, the methods are capable of both
compression and conditional emulation of the climate models. Considerable
attention is paid to accurately modeling the original dataset--one year of
daily mean temperature data--particularly with regard to the inherent spatial
nonstationarity in global fields, and to determining the statistics to be
stored, so that the variation in the original data can be closely captured,
while allowing for fast decompression and conditional emulation on modest
computers
Minimizing Negative Transfer of Knowledge in Multivariate Gaussian Processes: A Scalable and Regularized Approach
Recently there has been an increasing interest in the multivariate Gaussian
process (MGP) which extends the Gaussian process (GP) to deal with multiple
outputs. One approach to construct the MGP and account for non-trivial
commonalities amongst outputs employs a convolution process (CP). The CP is
based on the idea of sharing latent functions across several convolutions.
Despite the elegance of the CP construction, it provides new challenges that
need yet to be tackled. First, even with a moderate number of outputs, model
building is extremely prohibitive due to the huge increase in computational
demands and number of parameters to be estimated. Second, the negative transfer
of knowledge may occur when some outputs do not share commonalities. In this
paper we address these issues. We propose a regularized pairwise modeling
approach for the MGP established using CP. The key feature of our approach is
to distribute the estimation of the full multivariate model into a group of
bivariate GPs which are individually built. Interestingly pairwise modeling
turns out to possess unique characteristics, which allows us to tackle the
challenge of negative transfer through penalizing the latent function that
facilitates information sharing in each bivariate model. Predictions are then
made through combining predictions from the bivariate models within a Bayesian
framework. The proposed method has excellent scalability when the number of
outputs is large and minimizes the negative transfer of knowledge between
uncorrelated outputs. Statistical guarantees for the proposed method are
studied and its advantageous features are demonstrated through numerical
studies
Calibration of Computational Models with Categorical Parameters and Correlated Outputs via Bayesian Smoothing Spline ANOVA
It has become commonplace to use complex computer models to predict outcomes
in regions where data does not exist. Typically these models need to be
calibrated and validated using some experimental data, which often consists of
multiple correlated outcomes. In addition, some of the model parameters may be
categorical in nature, such as a pointer variable to alternate models (or
submodels) for some of the physics of the system. Here we present a general
approach for calibration in such situations where an emulator of the
computationally demanding models and a discrepancy term from the model to
reality are represented within a Bayesian Smoothing Spline (BSS) ANOVA
framework. The BSS-ANOVA framework has several advantages over the traditional
Gaussian Process, including ease of handling categorical inputs and correlated
outputs, and improved computational efficiency. Finally this framework is then
applied to the problem that motivated its design; a calibration of a
computational fluid dynamics model of a bubbling fluidized which is used as an
absorber in a CO2 capture system
- …