86 research outputs found

    Max-and-Smooth: a two-step approach for approximate Bayesian inference in latent Gaussian models

    Get PDF
    This is the final version. Available on open access from International Society for Bayesian Analysis (ISBA) via the DOI in this record. With modern high-dimensional data, complex statistical models are necessary, requiring computationally feasible inference schemes. We introduce Max-and-Smooth, an approximate Bayesian inference scheme for a flexible class of latent Gaussian models (LGMs) where one or more of the likelihood parameters are modeled by latent additive Gaussian processes. Max-and-Smooth consists of two-steps. In the first step (Max), the likelihood function is approximated by a Gaussian density with mean and covariance equal to either (a) the maximum likelihood estimate and the inverse observed information, respectively, or (b) the mean and covariance of the normalized likelihood function. In the second step (Smooth), the latent parameters and hyperparameters are inferred and smoothed with the approximated likelihood function. The proposed method ensures that the uncertainty from the first step is correctly propagated to the second step. Since the approximated likelihood function is Gaussian, the approximate posterior density of the latent parameters of the LGM (conditional on the hyperparameters) is also Gaussian, thus facilitating efficient posterior inference in high dimensions. Furthermore, the approximate marginal posterior distribution of the hyperparameters is tractable, and as a result, the hyperparameters can be sampled independently of the latent parameters. In the case of a large number of independent data replicates, sparse precision matrices, and high-dimensional latent vectors, the speedup is substantial in comparison to an MCMC scheme that infers the posterior density from the exact likelihood function. The proposed inference scheme is demonstrated on one spatially referenced real dataset and on simulated data mimicking spatial, temporal, and spatio-temporal inference problems. Our results show that Max-and-Smooth is accurate and fast.NER

    Approximate Bayesian inference for analysis of spatiotemporal flood frequency data

    Get PDF
    This is the final version. Available from the Institute of Mathematical Statistics via the DOI in this recordExtreme floods cause casualties and widespread damage to property and vital civil infrastructure. Predictions of extreme floods, within gauged and ungauged catchments, is crucial to mitigate these disasters. In this paper a Bayesian framework is proposed for predicting extreme floods, using the generalized extreme-value (GEV) distribution. A major methodological challenge is to find a suitable parametrization for the GEV distribution when multiple covariates and/or latent spatial effects are involved and a time trend is present. Other challenges involve balancing model complexity and parsimony, using an appropriate model selection procedure and making inference based on a reliable and computationally efficient approach. We here propose a latent Gaussian modeling framework with a novel multivariate link function designed to separate the interpretation of the parameters at the latent level and to avoid unreasonable estimates of the shape and time trend parameters. Structured additive regression models, which include catchment descriptors as covariates and spatially correlated model components, are proposed for the four parameters at the latent level. To achieve computational efficiency with large datasets and richly parametrized models, we exploit a highly accurate and fast approximate Bayesian inference approach which can also be used to efficiently select models separately for each of the four regression models at the latent level. We applied our proposed methodology to annual peak river flow data from 554 catchments across the United Kingdom. The framework performed well in terms of flood predictions for both ungauged catchments and future observations at gauged catchments. The results show that the spatial model components for the transformed location and scale parameters as well as the time trend are all important, and none of these should be ignored. Posterior estimates of the time trend parameters correspond to an average increase of about 1.5% per decade with range 0.1% to 2.8% and reveal a spatial structure across the United Kingdom. When the interest lies in estimating return levels for spatial aggregates, we further develop a novel copula-based postprocessing approach of posterior predictive samples in order to mitigate the effect of the conditional independence assumption at the data level, and we demonstrate that our approach indeed provides accurate results.University of Iceland Research Fun

    Approximate Bayesian inference for analysis of spatio-temporal flood frequency data

    Get PDF
    This is the final version. Available from the Institute of Mathematical Statistics via the DOI in this record. Extreme floods cause casualties, and widespread damage to property and vital civil infrastructure. We here propose a Bayesian approach for predicting extreme floods using the generalized extreme-value (GEV) distribution within gauged and ungauged catchments. A major methodological challenge is to find a suitable parametrization for the GEV distribution when covariates or latent spatial effects are involved. Other challenges involve balancing model complexity and parsimony using an appropriate model selection procedure, and making inference using a reliable and computationally efficient approach. Our approach relies on a latent Gaussian modeling framework with a novel multivariate link function designed to separate the interpretation of the parameters at the latent level and to avoid unreasonable estimates of the shape and time trend parameters. Structured additive regression models are proposed for the four parameters at the latent level. For computational efficiency with large datasets and richly parametrized models, we exploit an accurate and fast approximate Bayesian inference approach. We applied our proposed methodology to annual peak river flow data from 554 catchments across the United Kingdom (UK). Our model performed well in terms of flood predictions for both gauged and ungauged catchments. The results show that the spatial model components for the transformed location and scale parameters, and the time trend, are all important. Posterior estimates of the time trend parameters correspond to an average increase of about 1.5%1.5\% per decade and reveal a spatial structure across the UK. To estimate return levels for spatial aggregates, we further develop a novel copula-based post-processing approach of posterior predictive samples, in order to mitigate the effect of the conditional independence assumption at the data level, and we show that our approach provides accurate results.University of Iceland Research Fun

    Flexible modelling of spatial variation in agricultural field trials with the R package INLA

    Get PDF
    The objective of this paper was to fit different established spatial models for analysing agricultural field trials using the open-source R package INLA. Spatial variation is common in field trials, and accounting for it increases the accuracy of estimated genetic effects. However, this is still hindered by the lack of available software implementations. We compare some established spatial models and show possibilities for flexible modelling with respect to field trial design and joint modelling over multiple years and locations. We use a Bayesian framework and for statistical inference the integrated nested Laplace approximations (INLA) implemented in the R package INLA. The spatial models we use are the well-known independent row and column effects, separable first-order autoregressive ( AR1⊗AR1 ) models and a Gaussian random field (Matérn) model that is approximated via the stochastic partial differential equation approach. The Matérn model can accommodate flexible field trial designs and yields interpretable parameters. We test the models in a simulation study imitating a wheat breeding programme with different levels of spatial variation, with and without genome-wide markers and with combining data over two locations, modelling spatial and genetic effects jointly. The results show comparable predictive performance for both the AR1⊗AR1 and the Matérn models. We also present an example of fitting the models to a real wheat breeding data and simulated tree breeding data with the Nelder wheel design to show the flexibility of the Matérn model and the R package INLA

    DNA Sequence Profiles of the Colorectal Cancer Critical Gene Set KRAS-BRAF-PIK3CA-PTEN-TP53 Related to Age at Disease Onset

    Get PDF
    The incidence of colorectal cancer (CRC) increases with age and early onset indicates an increased likelihood for genetic predisposition for this disease. The somatic genetics of tumor development in relation to patient age remains mostly unknown. We have examined the mutation status of five known cancer critical genes in relation to age at diagnosis, and compared the genomic complexity of tumors from young patients without known CRC syndromes with those from elderly patients. Among 181 CRC patients, stratified by microsatellite instability status, DNA sequence changes were identified in KRAS (32%), BRAF (16%), PIK3CA (4%), PTEN (14%) and TP53 (51%). In patients younger than 50 years (n = 45), PIK3CA mutations were not observed and TP53 mutations were more frequent than in the older age groups. The total gene mutation index was lowest in tumors from the youngest patients. In contrast, the genome complexity, assessed as copy number aberrations, was highest in tumors from the youngest patients. A comparable number of tumors from young (<50 years) and old patients (>70 years) was quadruple negative for the four predictive gene markers (KRAS-BRAF-PIK3CA-PTEN); however, 16% of young versus only 1% of the old patients had tumor mutations in PTEN/PIK3CA exclusively. This implies that mutation testing for prediction of EGFR treatment response may be restricted to KRAS and BRAF in elderly (>70 years) patients. Distinct genetic differences found in tumors from young and elderly patients, whom are comparable for known clinical and pathological variables, indicate that young patients have a different genetic risk profile for CRC development than older patients

    Geostatistical modeling to capture seismic-shaking patterns from earthquake-induced landslides

    No full text
    We investigate earthquake‐induced landslides using a geostatistical model featuring a latent spatial effect (LSE). The LSE represents the spatially structured residuals in the data, which remain after adjusting for covariate effects. To determine whether the LSE captures the residual signal from a given trigger, we test the LSE in reproducing the pattern of seismic shaking from the distribution of seismically induced landslides, without prior knowledge of the earthquake being included in the model. We assessed the landslide intensity, that is, the expected number of landslides per mapping unit, for the area in which landslides triggered by the Wenchuan and Lushan earthquakes overlap. We examined this area to test our method on landslide inventories located in near and far fields of the earthquake. We generated three models for both earthquakes: (i) seismic parameters only (proxy for the trigger); (ii) the LSE only; and (iii) both seismic parameters and the LSE. The three configurations share the same morphometric covariates. This allowed us to study the LSE pattern and assess whether it approximated the seismic effects. Our results show that the LSE reproduced the shaking patterns for both earthquakes. In addition, the models including the LSE perform better than conventional models featuring seismic parameters only. Due to computational limitations we carried out a detailed analysis for a relatively small area (2,112 km2), using a data set with higher spatial resolution. Results were consistent with those of a subsequent analysis for a larger area (14,648 km2) using coarser‐resolution data

    Dealing with physical barriers in bottlenose dolphin (Tursiops truncatus) distribution

    No full text
    Worldwide, cetacean species have started to be protected, but they are still very vulnerable to accidental damage from an expanding range of human activities at sea. To properly manage these potential threats we need a detailed understanding of the seasonal distributions of these highly mobile populations. To achieve this goal, a growing effort has been underway to develop species distribution models (SDMs) that correctly describe and predict preferred species areas. However, accuracy is not always easy to achieve when physical barriers, such as islands, are present. Indeed, SDMs assume, if only implicitly, that the spatial effect is stationary, and that correlation is only dependent on the distance between observations and not on the direction or a spatial coordinates. The application of stationary SDMs in these cases could lead to incorrect predictions and, consequently, to uninformed decision making. In this study, we identify vulnerable habitats for the bottlenose dolphin in the Archipelago de La Maddalena, Northern Sardinia (Italy) using Bayesian hierarchical SDMs that account for the physical barriers issue and provide a full specification of the associated uncertainty. The approach we propose constitutes a major step forward in the understanding of cetacean species in many ecosystems where physical, geographical and topographical barriers are present
    corecore