75 research outputs found

    Generalized Bayesian MARS: Tools for Emulating Stochastic Computer Models

    Full text link
    The multivariate adaptive regression spline (MARS) approach of Friedman (1991) and its Bayesian counterpart (Francom et al. 2018) are effective approaches for the emulation of computer models. The traditional assumption of Gaussian errors limits the usefulness of MARS, and many popular alternatives, when dealing with stochastic computer models. We propose a generalized Bayesian MARS (GBMARS) framework which admits the broad class of generalized hyperbolic distributions as the induced likelihood function. This allows us to develop tools for the emulation of stochastic simulators which are parsimonious, scalable, interpretable and require minimal tuning, while providing powerful predictive and uncertainty quantification capabilities. GBMARS is capable of robust regression with t distributions, quantile regression with asymmetric Laplace distributions and a general form of "Normal-Wald" regression in which the shape of the error distribution and the structure of the mean function are learned simultaneously. We demonstrate the effectiveness of GBMARS on various stochastic computer models and we show that it compares favorably to several popular alternatives

    mlOSP: Towards a Unified Implementation of Regression Monte Carlo Algorithms

    Full text link
    We introduce mlOSP, a computational template for Machine Learning for Optimal Stopping Problems. The template is implemented in the R statistical environment and publicly available via a GitHub repository. mlOSP presents a unified numerical implementation of Regression Monte Carlo (RMC) approaches to optimal stopping, providing a state-of-the-art, open-source, reproducible and transparent platform. Highlighting its modular nature, we present multiple novel variants of RMC algorithms, especially in terms of constructing simulation designs for training the regressors, as well as in terms of machine learning regression modules. At the same time, mlOSP nests most of the existing RMC schemes, allowing for a consistent and verifiable benchmarking of extant algorithms. The article contains extensive R code snippets and figures, and serves the dual role of presenting new RMC features and as a vignette to the underlying software package.Comment: Package repository is at http://github.com/mludkov/mlOS

    Genetic based optimisation of the design parameters for an array-on-device orbital motion wave energy converter

    Get PDF
    Optimisation of Wave Energy Converters (WECs) is a very important topic to obtain competitive devices in the energy market. Wave energy is a renewable resource that could contribute significantly to a future sustainable world. Research is on-going to reduce costs and increase the amount of energy captured. This work aims to optimise a WaveSub device made up of multiple floats in a line by investigating the influence of 6 different design parameters such as the number of floats. Here we show that a multi-float configuration of 6 floats is more competitive in terms of Levelised Cost Of Energy (LCOE) compared to a single float configuration with a LCOE reduction of around 21%. We demonstrate that multi-float configurations of this device reduce the LCOE especially because of the reduction of grid connection, installation, control and mooring costs. From the power capture perspective, optimized multi-float configurations still have similar capacity factors to the single float configuration. This research gives important indications for further development of the WECs from an optimisation perspective. These promising results show that more complex, optimized, multi-float configurations could be investigated in future

    Statistical Methods for Large Spatial and Spatio-temporal Datasets

    Get PDF
    Classical statistical models encounter the computational bottleneck for large spatial/spatio-temporal datasets. This dissertation contains three articles describing computationally efficient approximation methods for applying Gaussian process models to large spatial and spatio-temporal datasets. The first article extends the FSA-Block approach in [60] in the sense of preserving more information of the residual covariance matrix. By using a block conditional likelihood approximation to the residual likelihood, the residual covariance of neighboring data blocks can be preserved, which relaxes the conditional independence assumption of the FSA-Block approach. We show that the approximated likelihood by the proposed method is Gaussian with an explicit form of covariance matrix, and the computational complexity is linear with sample size n. We also show that the proposed method can result in a valid Gaussian process so that both the parameter estimation and prediction are consistent in the same model framework. Since neighborhood information are incorporated in approximating the residual covariance function, simulation studies show that the proposed method can further alleviate the mismatch problems in predicting responses on block boundary locations. The second article is the spatio-temporal extension of the FSA-Block approach, where we model the space-time responses as realizations from a Gaussian process model of spatio-temporal covariance functions. Since the knot number and locations are crucial to the model performance, a reversible jump Markov chain Monte Carlo (RJMCMC) algorithm is proposed to select knots automatically from a discrete set of spatio-temporal points for the proposed method. We show that the proposed knot selection algorithm can result in more robust prediction results. Then the proposed method is compared with weighted composite likelihood method through simulation studies and an ozone dataset. The third article applies the nonseparable auto-covariance function to model the computer code outputs. It proposes a multi-output Gaussian process emulator with a nonseparable auto-covariance function to avoid limitations of using separable emulators. To facilitate the computation of nonseparable emulator, we introduce the FSA-Block approach to approximate the proposed model. Then we compare the proposed method with Gaussian process emulator with separable covariance models through simulated examples and a real computer code

    Statistical Methods for Large Spatial and Spatio-temporal Datasets

    Get PDF
    Classical statistical models encounter the computational bottleneck for large spatial/spatio-temporal datasets. This dissertation contains three articles describing computationally efficient approximation methods for applying Gaussian process models to large spatial and spatio-temporal datasets. The first article extends the FSA-Block approach in [60] in the sense of preserving more information of the residual covariance matrix. By using a block conditional likelihood approximation to the residual likelihood, the residual covariance of neighboring data blocks can be preserved, which relaxes the conditional independence assumption of the FSA-Block approach. We show that the approximated likelihood by the proposed method is Gaussian with an explicit form of covariance matrix, and the computational complexity is linear with sample size n. We also show that the proposed method can result in a valid Gaussian process so that both the parameter estimation and prediction are consistent in the same model framework. Since neighborhood information are incorporated in approximating the residual covariance function, simulation studies show that the proposed method can further alleviate the mismatch problems in predicting responses on block boundary locations. The second article is the spatio-temporal extension of the FSA-Block approach, where we model the space-time responses as realizations from a Gaussian process model of spatio-temporal covariance functions. Since the knot number and locations are crucial to the model performance, a reversible jump Markov chain Monte Carlo (RJMCMC) algorithm is proposed to select knots automatically from a discrete set of spatio-temporal points for the proposed method. We show that the proposed knot selection algorithm can result in more robust prediction results. Then the proposed method is compared with weighted composite likelihood method through simulation studies and an ozone dataset. The third article applies the nonseparable auto-covariance function to model the computer code outputs. It proposes a multi-output Gaussian process emulator with a nonseparable auto-covariance function to avoid limitations of using separable emulators. To facilitate the computation of nonseparable emulator, we introduce the FSA-Block approach to approximate the proposed model. Then we compare the proposed method with Gaussian process emulator with separable covariance models through simulated examples and a real computer code

    Generative AI for Bayesian Computation

    Full text link
    We develop Generative AI (Gen-AI) methods for Bayesian Computation. Gen-AI naturally applies to Bayesian models which are easily simulated. We generate a large training dataset and together with deep neural networks we uncover the inverse Bayes map for inference and prediction. To do this, we require high dimensional regression methods and dimensionality reduction (a.k.a feature selection). The main advantage of Generative AI is its ability to be model-free and the fact that it doesn't rely on densities. Bayesian computation is replaced by pattern recognition of an input-output map. This map is learned from empirical model simulation. We show that Deep Quantile NNs provide a general framework for inference decision making. To illustrate our methodology, we provide three examples: a stylized synthetic example, a traffic flow prediction problem and we analyze the well-known Ebola data-set. Finally, we conclude with directions for future research.Comment: arXiv admin note: text overlap with arXiv:2209.0216

    Analyzing stochastic computer models: A review with opportunities

    Get PDF
    This is the author accepted manuscript. The final version is available from the Institute of Mathematical Statistics via the DOI in this record In modern science, computer models are often used to understand complex phenomena, and a thriving statistical community has grown around analyzing them. This review aims to bring a spotlight to the growing prevalence of stochastic computer models -- providing a catalogue of statistical methods for practitioners, an introductory view for statisticians (whether familiar with deterministic computer models or not), and an emphasis on open questions of relevance to practitioners and statisticians. Gaussian process surrogate models take center stage in this review, and these, along with several extensions needed for stochastic settings, are explained. The basic issues of designing a stochastic computer experiment and calibrating a stochastic computer model are prominent in the discussion. Instructive examples, with data and code, are used to describe the implementation of, and results from, various methods.European Union FP7DOE LABNational Science Foundatio

    Postglacial sea-level change: novel insights from physical and statistical modelling

    Get PDF
    Developing accurate projections of future sea-level change is a key challenge for the entire science community under the current warming climate. Due to the fact that modern instrumental sea-level observations are only available since the 19-20th century, sea-level projections based on them can only capture short-term effects, leaving physical processes that dominate over longer timescales underestimated. Therefore, an essential step towards accurate and robust long-term sea-level projections is to investigate the physical processes that impact the spatio-temporal evolution of sea-level change over centennial to millennial timescales. Due to sometimes scarce and often noisy palaeo sea-level observations, mechanisms of sea-level change over geological timescales are still not well-understood, with many outstanding questions to be resolved. This thesis develops novel physical and statistical models to better understand the mechanisms behind postglacial sea-level change. Specifically, this thesis focuses on three outstanding problems that are not only important in postglacial sea-level change but also in understanding past ice sheet dynamics and palaeoclimate change. Firstly, a statistical framework is developed to invert the sources of meltwater pulse 1A, the largest and most rapid global sea-level rise event of the last deglaciation, with sophisticated treatment of uncertainties associated with sea-level reconstructions and geophysical modelling. The results suggest there were contributions from North America, 12.0 m (5.6-15.4 m; 95% probability), Scandinavia, 4.6 m (3.2-6.4 m), and Antarctica, 1.3 m (0-5.9 m), giving a total global mean sea-level rise of 17.9 m (15.7-20.2 m) in 500 years. Secondly, the missing ice problem (distinctive imbalance between observed global mean sea-level rise and the reconstructed amount of ice-sheet melt) is revisited by including an extra physical process (sediment isostatic adjustment, SIA) which has not been considered in this problem before. In particular, this thesis investigates the impact of SIA on local RSL variation across the Great Barrier Reef (GBR), the world's largest mixed carbonate-siliciclastic sediment system. Based on a Bayesian calibration method, SIA can contribute up to 1.1 m relative sea-level rise in the outer shelf of the southern central GBR from 28 ka to present. Because the SIA-induced RSL rise is unrelated to ice mass loss, failing to correct for this signal will lead to systematic overestimation of grounded ice volume. Therefore, incorporating the SIA process will reduce the global grounded ice volume estimate for the Last Glacial Maximum (LGM), which can help to mitigate the missing ice problem. Lastly, robust global barystatic sea-level maps with minimum dependency on the detailed geometry of past ice sheet change are reconstructed. Estimating such maps requires physical simulation of relative sea-level corresponding to thousands of different ice histories, which is computationally prohibitive. To improve this situation, this thesis develops a statistical emulator which can mimic the behaviour of a physics-based model and is computationally much cheaper to evaluate. The results highlight the Seychelles as an exceptionally good place to map barystatic sea level throughout the last deglaciation because RSL at this location only slightly departs from global barystatic sea level, with minor dependency on the assumed ice history. Together, these physical and statistical models present powerful tools to yield novel insights into postglacial sea-level change mechanisms and hence they have the potential to yield more robust, accurate and trust-worthy sea-level change projections
    • …
    corecore