75 research outputs found

    Emulating dynamic non-linear simulators using Gaussian processes

    Get PDF
    The dynamic emulation of non-linear deterministic computer codes where the output is a time series, possibly multivariate, is examined. Such computer models simulate the evolution of some real-world phenomenon over time, for example models of the climate or the functioning of the human brain. The models we are interested in are highly non-linear and exhibit tipping points, bifurcations and chaotic behaviour. However, each simulation run could be too time-consuming to perform analyses that require many runs, including quantifying the variation in model output with respect to changes in the inputs. Therefore, Gaussian process emulators are used to approximate the output of the code. To do this, the flow map of the system under study is emulated over a short time period. Then, it is used in an iterative way to predict the whole time series. A number of ways are proposed to take into account the uncertainty of inputs to the emulators, after fixed initial conditions, and the correlation between them through the time series. The methodology is illustrated with two examples: the highly non-linear dynamical systems described by the Lorenz and Van der Pol equations. In both cases, the predictive performance is relatively high and the measure of uncertainty provided by the method reflects the extent of predictability in each system

    A generalized Gaussian process model for computer experiments with binary time series

    Full text link
    Non-Gaussian observations such as binary responses are common in some computer experiments. Motivated by the analysis of a class of cell adhesion experiments, we introduce a generalized Gaussian process model for binary responses, which shares some common features with standard GP models. In addition, the proposed model incorporates a flexible mean function that can capture different types of time series structures. Asymptotic properties of the estimators are derived, and an optimal predictor as well as its predictive distribution are constructed. Their performance is examined via two simulation studies. The methodology is applied to study computer simulations for cell adhesion experiments. The fitted model reveals important biological information in repeated cell bindings, which is not directly observable in lab experiments.Comment: 49 pages, 4 figure

    Statistical Methods for Large Spatial and Spatio-temporal Datasets

    Get PDF
    Classical statistical models encounter the computational bottleneck for large spatial/spatio-temporal datasets. This dissertation contains three articles describing computationally efficient approximation methods for applying Gaussian process models to large spatial and spatio-temporal datasets. The first article extends the FSA-Block approach in [60] in the sense of preserving more information of the residual covariance matrix. By using a block conditional likelihood approximation to the residual likelihood, the residual covariance of neighboring data blocks can be preserved, which relaxes the conditional independence assumption of the FSA-Block approach. We show that the approximated likelihood by the proposed method is Gaussian with an explicit form of covariance matrix, and the computational complexity is linear with sample size n. We also show that the proposed method can result in a valid Gaussian process so that both the parameter estimation and prediction are consistent in the same model framework. Since neighborhood information are incorporated in approximating the residual covariance function, simulation studies show that the proposed method can further alleviate the mismatch problems in predicting responses on block boundary locations. The second article is the spatio-temporal extension of the FSA-Block approach, where we model the space-time responses as realizations from a Gaussian process model of spatio-temporal covariance functions. Since the knot number and locations are crucial to the model performance, a reversible jump Markov chain Monte Carlo (RJMCMC) algorithm is proposed to select knots automatically from a discrete set of spatio-temporal points for the proposed method. We show that the proposed knot selection algorithm can result in more robust prediction results. Then the proposed method is compared with weighted composite likelihood method through simulation studies and an ozone dataset. The third article applies the nonseparable auto-covariance function to model the computer code outputs. It proposes a multi-output Gaussian process emulator with a nonseparable auto-covariance function to avoid limitations of using separable emulators. To facilitate the computation of nonseparable emulator, we introduce the FSA-Block approach to approximate the proposed model. Then we compare the proposed method with Gaussian process emulator with separable covariance models through simulated examples and a real computer code

    Statistical Methods for Large Spatial and Spatio-temporal Datasets

    Get PDF
    Classical statistical models encounter the computational bottleneck for large spatial/spatio-temporal datasets. This dissertation contains three articles describing computationally efficient approximation methods for applying Gaussian process models to large spatial and spatio-temporal datasets. The first article extends the FSA-Block approach in [60] in the sense of preserving more information of the residual covariance matrix. By using a block conditional likelihood approximation to the residual likelihood, the residual covariance of neighboring data blocks can be preserved, which relaxes the conditional independence assumption of the FSA-Block approach. We show that the approximated likelihood by the proposed method is Gaussian with an explicit form of covariance matrix, and the computational complexity is linear with sample size n. We also show that the proposed method can result in a valid Gaussian process so that both the parameter estimation and prediction are consistent in the same model framework. Since neighborhood information are incorporated in approximating the residual covariance function, simulation studies show that the proposed method can further alleviate the mismatch problems in predicting responses on block boundary locations. The second article is the spatio-temporal extension of the FSA-Block approach, where we model the space-time responses as realizations from a Gaussian process model of spatio-temporal covariance functions. Since the knot number and locations are crucial to the model performance, a reversible jump Markov chain Monte Carlo (RJMCMC) algorithm is proposed to select knots automatically from a discrete set of spatio-temporal points for the proposed method. We show that the proposed knot selection algorithm can result in more robust prediction results. Then the proposed method is compared with weighted composite likelihood method through simulation studies and an ozone dataset. The third article applies the nonseparable auto-covariance function to model the computer code outputs. It proposes a multi-output Gaussian process emulator with a nonseparable auto-covariance function to avoid limitations of using separable emulators. To facilitate the computation of nonseparable emulator, we introduce the FSA-Block approach to approximate the proposed model. Then we compare the proposed method with Gaussian process emulator with separable covariance models through simulated examples and a real computer code

    Multivariate emulation of computer simulators: model selection and diagnostics with application to a humanitarian relief model

    Get PDF
    We present a common framework for Bayesian emulation methodologies for multivariate-output simulators, or computer models, that employ either parametric linear models or nonparametric Gaussian processes. Novel diagnostics suitable for multivariate covariance-separable emulators are developed and techniques to improve the adequacy of an emulator are discussed and implemented. A variety of emulators are compared for a humanitarian relief simulator, modelling aid missions to Sicily after a volcanic eruption and earthquake, and a sensitivity analysis is conducted to determine the sensitivity of the simulator output to changes in the input variables. The results from parametric and nonparametric emulators are compared in terms of prediction accuracy, uncertainty quantification and scientific interpretability

    Compression and Conditional Emulation of Climate Model Output

    Full text link
    Numerical climate model simulations run at high spatial and temporal resolutions generate massive quantities of data. As our computing capabilities continue to increase, storing all of the data is not sustainable, and thus it is important to develop methods for representing the full datasets by smaller compressed versions. We propose a statistical compression and decompression algorithm based on storing a set of summary statistics as well as a statistical model describing the conditional distribution of the full dataset given the summary statistics. The statistical model can be used to generate realizations representing the full dataset, along with characterizations of the uncertainties in the generated data. Thus, the methods are capable of both compression and conditional emulation of the climate models. Considerable attention is paid to accurately modeling the original dataset--one year of daily mean temperature data--particularly with regard to the inherent spatial nonstationarity in global fields, and to determining the statistics to be stored, so that the variation in the original data can be closely captured, while allowing for fast decompression and conditional emulation on modest computers

    Minimizing Negative Transfer of Knowledge in Multivariate Gaussian Processes: A Scalable and Regularized Approach

    Full text link
    Recently there has been an increasing interest in the multivariate Gaussian process (MGP) which extends the Gaussian process (GP) to deal with multiple outputs. One approach to construct the MGP and account for non-trivial commonalities amongst outputs employs a convolution process (CP). The CP is based on the idea of sharing latent functions across several convolutions. Despite the elegance of the CP construction, it provides new challenges that need yet to be tackled. First, even with a moderate number of outputs, model building is extremely prohibitive due to the huge increase in computational demands and number of parameters to be estimated. Second, the negative transfer of knowledge may occur when some outputs do not share commonalities. In this paper we address these issues. We propose a regularized pairwise modeling approach for the MGP established using CP. The key feature of our approach is to distribute the estimation of the full multivariate model into a group of bivariate GPs which are individually built. Interestingly pairwise modeling turns out to possess unique characteristics, which allows us to tackle the challenge of negative transfer through penalizing the latent function that facilitates information sharing in each bivariate model. Predictions are then made through combining predictions from the bivariate models within a Bayesian framework. The proposed method has excellent scalability when the number of outputs is large and minimizes the negative transfer of knowledge between uncorrelated outputs. Statistical guarantees for the proposed method are studied and its advantageous features are demonstrated through numerical studies

    Calibration of Computational Models with Categorical Parameters and Correlated Outputs via Bayesian Smoothing Spline ANOVA

    Full text link
    It has become commonplace to use complex computer models to predict outcomes in regions where data does not exist. Typically these models need to be calibrated and validated using some experimental data, which often consists of multiple correlated outcomes. In addition, some of the model parameters may be categorical in nature, such as a pointer variable to alternate models (or submodels) for some of the physics of the system. Here we present a general approach for calibration in such situations where an emulator of the computationally demanding models and a discrepancy term from the model to reality are represented within a Bayesian Smoothing Spline (BSS) ANOVA framework. The BSS-ANOVA framework has several advantages over the traditional Gaussian Process, including ease of handling categorical inputs and correlated outputs, and improved computational efficiency. Finally this framework is then applied to the problem that motivated its design; a calibration of a computational fluid dynamics model of a bubbling fluidized which is used as an absorber in a CO2 capture system
    • …
    corecore