28 research outputs found

    Statistical Methods for Large Spatial and Spatio-temporal Datasets

    Get PDF
    Classical statistical models encounter the computational bottleneck for large spatial/spatio-temporal datasets. This dissertation contains three articles describing computationally efficient approximation methods for applying Gaussian process models to large spatial and spatio-temporal datasets. The first article extends the FSA-Block approach in [60] in the sense of preserving more information of the residual covariance matrix. By using a block conditional likelihood approximation to the residual likelihood, the residual covariance of neighboring data blocks can be preserved, which relaxes the conditional independence assumption of the FSA-Block approach. We show that the approximated likelihood by the proposed method is Gaussian with an explicit form of covariance matrix, and the computational complexity is linear with sample size n. We also show that the proposed method can result in a valid Gaussian process so that both the parameter estimation and prediction are consistent in the same model framework. Since neighborhood information are incorporated in approximating the residual covariance function, simulation studies show that the proposed method can further alleviate the mismatch problems in predicting responses on block boundary locations. The second article is the spatio-temporal extension of the FSA-Block approach, where we model the space-time responses as realizations from a Gaussian process model of spatio-temporal covariance functions. Since the knot number and locations are crucial to the model performance, a reversible jump Markov chain Monte Carlo (RJMCMC) algorithm is proposed to select knots automatically from a discrete set of spatio-temporal points for the proposed method. We show that the proposed knot selection algorithm can result in more robust prediction results. Then the proposed method is compared with weighted composite likelihood method through simulation studies and an ozone dataset. The third article applies the nonseparable auto-covariance function to model the computer code outputs. It proposes a multi-output Gaussian process emulator with a nonseparable auto-covariance function to avoid limitations of using separable emulators. To facilitate the computation of nonseparable emulator, we introduce the FSA-Block approach to approximate the proposed model. Then we compare the proposed method with Gaussian process emulator with separable covariance models through simulated examples and a real computer code

    Statistical Methods for Large Spatial and Spatio-temporal Datasets

    Get PDF
    Classical statistical models encounter the computational bottleneck for large spatial/spatio-temporal datasets. This dissertation contains three articles describing computationally efficient approximation methods for applying Gaussian process models to large spatial and spatio-temporal datasets. The first article extends the FSA-Block approach in [60] in the sense of preserving more information of the residual covariance matrix. By using a block conditional likelihood approximation to the residual likelihood, the residual covariance of neighboring data blocks can be preserved, which relaxes the conditional independence assumption of the FSA-Block approach. We show that the approximated likelihood by the proposed method is Gaussian with an explicit form of covariance matrix, and the computational complexity is linear with sample size n. We also show that the proposed method can result in a valid Gaussian process so that both the parameter estimation and prediction are consistent in the same model framework. Since neighborhood information are incorporated in approximating the residual covariance function, simulation studies show that the proposed method can further alleviate the mismatch problems in predicting responses on block boundary locations. The second article is the spatio-temporal extension of the FSA-Block approach, where we model the space-time responses as realizations from a Gaussian process model of spatio-temporal covariance functions. Since the knot number and locations are crucial to the model performance, a reversible jump Markov chain Monte Carlo (RJMCMC) algorithm is proposed to select knots automatically from a discrete set of spatio-temporal points for the proposed method. We show that the proposed knot selection algorithm can result in more robust prediction results. Then the proposed method is compared with weighted composite likelihood method through simulation studies and an ozone dataset. The third article applies the nonseparable auto-covariance function to model the computer code outputs. It proposes a multi-output Gaussian process emulator with a nonseparable auto-covariance function to avoid limitations of using separable emulators. To facilitate the computation of nonseparable emulator, we introduce the FSA-Block approach to approximate the proposed model. Then we compare the proposed method with Gaussian process emulator with separable covariance models through simulated examples and a real computer code

    Emulating dynamic non-linear simulators using Gaussian processes

    Get PDF
    The dynamic emulation of non-linear deterministic computer codes where the output is a time series, possibly multivariate, is examined. Such computer models simulate the evolution of some real-world phenomenon over time, for example models of the climate or the functioning of the human brain. The models we are interested in are highly non-linear and exhibit tipping points, bifurcations and chaotic behaviour. However, each simulation run could be too time-consuming to perform analyses that require many runs, including quantifying the variation in model output with respect to changes in the inputs. Therefore, Gaussian process emulators are used to approximate the output of the code. To do this, the flow map of the system under study is emulated over a short time period. Then, it is used in an iterative way to predict the whole time series. A number of ways are proposed to take into account the uncertainty of inputs to the emulators, after fixed initial conditions, and the correlation between them through the time series. The methodology is illustrated with two examples: the highly non-linear dynamical systems described by the Lorenz and Van der Pol equations. In both cases, the predictive performance is relatively high and the measure of uncertainty provided by the method reflects the extent of predictability in each system

    Multivariate sensitivity analysis for a large-scale climate impact and adaptation model

    Get PDF
    We develop a new efficient methodology for Bayesian global sensitivity analysis for large-scale multivariate data. The focus is on computationally demanding models with correlated variables. A multivariate Gaussian process is used as a surrogate model to replace the expensive computer model. To improve the computational efficiency and performance of the model, compactly supported correlation functions are used. The goal is to generate sparse matrices, which give crucial advantages when dealing with large datasets, where we use cross-validation to determine the optimal degree of sparsity. This method was combined with a robust adaptive Metropolis algorithm coupled with a parallel implementation to speed up the convergence to the target distribution. The method was applied to a multivariate dataset from the IMPRESSIONS Integrated Assessment Platform (IAP2), an extension of the CLIMSAVE IAP, which has been widely applied in climate change impact, adaptation and vulnerability assessments. Our empirical results on synthetic and IAP2 data show that the proposed methods are efficient and accurate for global sensitivity analysis of complex models

    Gaussian process for ground-motion prediction and emulation of systems of computer models

    Get PDF
    In this thesis, several challenges in both ground-motion modelling and the surrogate modelling, are addressed by developing methods based on Gaussian processes (GP). The first chapter contains an overview of the GP and summarises the key findings of the rest of the thesis. In the second chapter, an estimation algorithm, called the Scoring estimation approach, is developed to train GP-based ground-motion models with spatial correlation. The Scoring estimation approach is introduced theoretically and numerically, and it is proven to have desirable properties on convergence and computation. It is a statistically robust method, producing consistent and statistically efficient estimators of spatial correlation parameters. The predictive performance of the estimated ground-motion model is assessed by a simulation-based application, which gives important implications on the seismic risk assessment. In the third chapter, a GP-based surrogate model, called the integrated emulator, is introduced to emulate a system of multiple computer models. It generalises the state-of-the-art linked emulator for a system of two computer models and considers a variety of kernels (exponential, squared exponential, and two key Matérn kernels) that are essential in advanced applications. By learning the system structure, the integrated emulator outperforms the composite emulator, which emulates the entire system using only global inputs and outputs. Furthermore, its analytic expressions allow a fast and efficient design algorithm that could yield significant computational and predictive gains by allocating different runs to individual computer models based on their heterogeneous functional complexity. The benefits of the integrated emulator are demonstrated in a series of synthetic experiments and a feed-back coupled fire-detection satellite model. Finally, the developed method underlying the integrated emulator is used to construct a non-stationary Gaussian process model based on deep Gaussian hierarchy

    Linked Gaussian Process Emulation for Systems of Computer Models Using Matérn Kernels and Adaptive Design

    Get PDF
    The state-of-the-art linked Gaussian process offers a way to build analytical emulators for systems of computer models. We generalize the closed form expressions for the linked Gaussian process under the squared exponential kernel to a class of Mat\'ern kernels, that are essential in advanced applications. An iterative procedure to construct linked Gaussian processes as surrogate models for any feed-forward systems of computer models is presented and illustrated on a feed-back coupled satellite system. We also introduce an adaptive design algorithm that could increase the approximation accuracy of linked Gaussian process surrogates with reduced computational costs on running expensive computer systems, by allocating runs and refining emulators of individual sub-models based on their heterogeneous functional complexity

    On Novel Approaches to Model-Based Structural Health Monitoring

    Get PDF
    Structural health monitoring (SHM) strategies have classically fallen into two main categories of approach: model-driven and data-driven methods. The former utilises physics-based models and inverse techniques as a method for inferring the health state of a structure from changes to updated parameters; hence defined as inverse model-driven approaches. The other frames SHM within a statistical pattern recognition paradigm. These methods require no physical modelling, instead inferring relationships between data and health states directly. Although successes with both approaches have been made, they both suffer from significant drawbacks, namely parameter estimation and interpretation difficulties within the inverse model-driven framework, and a lack of available full-system damage state data for data-driven techniques. Consequently, this thesis seeks to outline and develop a framework for an alternative category of approach; forward model-driven SHM. This class of strategies utilise calibrated physics-based models, in a forward manner, to generate health state data (i.e. the undamaged condition and damage states of interest) for training machine learning or pattern recognition technologies. As a result the framework seeks to provide potential solutions to these issues by removing the need for making health decisions from updated parameters and providing a mechanism for obtaining health state data. In light of this objective, a framework for forward model-driven SHM is established, highlighting key challenges and technologies that are required for realising this category of approach. The framework is constructed from two main components: generating physics-based models that accurately predict outputs under various damage scenarios, and machine learning methods used to infer decision bounds. This thesis deals with the former, developing technologies and strategies for producing statistically representative predictions from physics-based models. Specifically this work seeks to define validation within this context and propose a validation strategy, develop technologies that infer uncertainties from various sources, including model discrepancy, and offer a solution to the issue of validating full-system predictions when data is not available at this level. The first section defines validation within a forward model-driven context, offering a strategy of hypothesis testing, statistical distance metrics, visualisation tools, such as the witness function, and deterministic metrics. The statistical distances field is shown to provide a wealth of potential validation metrics that consider whole probability distributions. Additionally, existing validation metrics can be categorised within this fields terminology, providing greater insight. In the second part of this study emulator technologies, specifically Gaussian Process (GP) methods, are discussed. Practical implementation considerations are examined, including the establishment of validation and diagnostic techniques. Various GP extensions are outlined, with particular focus on technologies for dealing with large data sets and their applicability as emulators. Utilising these technologies two techniques for calibrating models, whilst accounting for and inferring model discrepancies, are demonstrated: Bayesian Calibration and Bias Correction (BCBC) and Bayesian History Matching (BHM). Both methods were applied to representative building structures in order to demonstrate their effectiveness within a forward model-driven SHM strategy. Sequential design heuristics were developed for BHM along with an importance sampling based technique for inferring the functional model discrepancy uncertainties. The third body of work proposes a multi-level uncertainty integration strategy by developing a subfunction discrepancy approach. This technique seeks to construct a methodology for producing valid full-system predictions through a combination of validated sub-system models where uncertainties and model discrepancy have been quantified. This procedure is demonstrated on a numerical shear structure where it is shown to be effective. Finally, conclusions about the aforementioned technologies are provided. In addition, a review of the future directions for forward model-driven SHM are outlined with the hope that this category receives wider investigation within the SHM community
    corecore