1,024 research outputs found

    Gaussian process single-index models as emulators for computer experiments

    Full text link
    A single-index model (SIM) provides for parsimonious multi-dimensional nonlinear regression by combining parametric (linear) projection with univariate nonparametric (non-linear) regression models. We show that a particular Gaussian process (GP) formulation is simple to work with and ideal as an emulator for some types of computer experiment as it can outperform the canonical separable GP regression model commonly used in this setting. Our contribution focuses on drastically simplifying, re-interpreting, and then generalizing a recently proposed fully Bayesian GP-SIM combination, and then illustrating its favorable performance on synthetic data and a real-data computer experiment. Two R packages, both released on CRAN, have been augmented to facilitate inference under our proposed model(s).Comment: 23 pages, 9 figures, 1 tabl

    Multivariate emulation of computer simulators: model selection and diagnostics with application to a humanitarian relief model

    Get PDF
    We present a common framework for Bayesian emulation methodologies for multivariate-output simulators, or computer models, that employ either parametric linear models or nonparametric Gaussian processes. Novel diagnostics suitable for multivariate covariance-separable emulators are developed and techniques to improve the adequacy of an emulator are discussed and implemented. A variety of emulators are compared for a humanitarian relief simulator, modelling aid missions to Sicily after a volcanic eruption and earthquake, and a sensitivity analysis is conducted to determine the sensitivity of the simulator output to changes in the input variables. The results from parametric and nonparametric emulators are compared in terms of prediction accuracy, uncertainty quantification and scientific interpretability

    Emulation of multivariate simulators using thin-plate splines with application to atmospheric dispersion

    No full text
    It is often desirable to build a statistical emulator of a complex computer simulator in order to perform analysis which would otherwise be computationally infeasible. We propose methodology to model multivariate output from a computer simulator taking into account output structure in the responses. The utility of this approach is demonstrated by applying it to a chemical and biological hazard prediction model. Predicting the hazard area which results from an accidental or deliberate chemical or biological release is imperative in civil and military planning and also in emergency response. The hazard area resulting from such a release is highly structured in space and we therefore propose the use of a thin-plate spline to capture the spatial structure and fit a Gaussian process emulator to the coefficients of the resultant basis functions. We compare and contrast four different techniques for emulating multivariate output: dimension-reduction using (i) a fully Bayesian approach with a principal component basis, (ii) a fully Bayesian approach with a thin-plate spline basis, assuming that the basis coefficients are independent, and (iii) a “plug-in” Bayesian approach with a thin-plate spline basis and a separable covariance structure; and (iv) a functional data modeling approach using a tensor-product (separable) Gaussian process. We develop methodology for the two thin-plate spline emulators and demonstrate that these emulators significantly outperform the principal component emulator. Further, the separable thin-plate spline emulator, which accounts for the dependence between basis coefficients, provides substantially more realistic quantification of uncertainty, and is also computationally more tractable, allowing fast emulation. For high resolution output data, it also offers substantial predictive and computational ad- vantages over the tensor-product Gaussian process emulator

    Gaussian process hyper-parameter estimation using parallel asymptotically independent Markov sampling

    Get PDF
    Gaussian process emulators of computationally expensive computer codes provide fast statistical approximations to model physical processes. The training of these surrogates depends on the set of design points chosen to run the simulator. Due to computational cost, such training set is bound to be limited and quantifying the resulting uncertainty in the hyper-parameters of the emulator by uni-modal distributions is likely to induce bias. In order to quantify this uncertainty, this paper proposes a computationally efficient sampler based on an extension of Asymptotically Independent Markov Sampling, a recently developed algorithm for Bayesian inference. Structural uncertainty of the emulator is obtained as a by-product of the Bayesian treatment of the hyper-parameters. Additionally, the user can choose to perform stochastic optimisation to sample from a neighbourhood of the Maximum a Posteriori estimate, even in the presence of multimodality. Model uncertainty is also acknowledged through numerical stabilisation measures by including a nugget term in the formulation of the probability model. The efficiency of the proposed sampler is illustrated in examples where multi-modal distributions are encountered. For the purpose of reproducibility, further development, and use in other applications the code used to generate the examples is freely available for download at https://github.com/agarbuno/paims_codesComment: Computational Statistics \& Data Analysis, Volume 103, November 201

    Comparison of surrogate-based uncertainty quantification methods for computationally expensive simulators

    Get PDF
    This version: arXiv:1511.00926v4 [math.ST] Available from ArXiv.org via the link in this record.Polynomial chaos and Gaussian process emulation are methods for surrogate-based uncertainty quantification, and have been developed independently in their respective communities over the last 25 years. Despite tackling similar problems in the field, to our knowledge there has yet to be a critical comparison of the two approaches in the literature. We begin by providing a detailed description of polynomial chaos and Gaussian process approaches for building a surrogate model of a black-box function. The accuracy of each surrogate method is then tested and compared for two simulators used in industry: a land-surface model (adJULES) and a launch vehicle controller (VEGACONTROL). We analyse surrogates built on experimental designs of various size and type to investigate their performance in a range of modelling scenarios. Specifically, polynomial chaos and Gaussian process surrogates are built on Sobol sequence and tensor grid designs. Their accuracy is measured by their ability to estimate the mean, standard deviation, exceedance probabilities and probability density function of the simulator output, as well as a root mean square error metric, based on an independent validation design. We find that one method does not unanimously outperform the other, but advantages can be gained in some cases, such that the preferred method depends on the modelling goals of the practitioner. Our conclusions are likely to depend somewhat on the modelling choices for the surrogates as well as the design strategy. We hope that this work will spark future comparisons of the two methods in their more advanced formulations and for different sampling strategies
    • …
    corecore