Search CORE

31 research outputs found

Cross-validatory Model Comparison and Divergent Regions Detection using iIS and iWAIC for Disease Mapping

Author: Qiu Shi
Publication venue: 'University of Saskatchewan Library'
Publication date
Field of study

The well-documented problems associated with mapping raw rates of disease have resulted in an increased use of Bayesian hierarchical models to produce maps of "smoothed'' estimates of disease rates. Two statistical problems arise in using Bayesian hierarchical models for disease mapping. The first problem is in comparing goodness of fit of various models, which can be used to test different hypotheses. The second problem is in identifying outliers/divergent regions with unusually high or low residual risk of disease, or those whose disease rates are not well fitted. The results of outlier detection may generate further hypotheses as to what additional covariates might be necessary for explaining the disease. Leave-one-out cross-validatory (LOOCV) model assessment has been used for these two problems. However, actual LOOCV is time-consuming. This thesis introduces two methods, namely iIS and iWAIC, for approximating LOOCV, using only Markov chain samples simulated from a posterior distribution based on a full data set. In iIS and iWAIC, we first integrate the latent variables without reference to holdout observation, then apply IS and WAIC approximations to the integrated predictive density and evaluation function. We apply iIS and iWAIC to two real data sets. Our empirical results show that iIS and iWAIC can provide significantly better estimation of LOOCV model assessment than existing methods including DIC, Importance Sampling, WAIC, posterior checking and Ghosting methods

eCommons@USASK

University of Saskatchewan Research Archive

Approximating Cross-validatory Predictive P-values with Integrated IS for Disease Mapping Models

Author: Feng Cindy X.
Li Longhai
Qiu Shi
Publication venue: 'Wiley'
Publication date: 24/03/2016
Field of study

An important statistical task in disease mapping problems is to identify out- lier/divergent regions with unusually high or low residual risk of disease. Leave-one-out cross-validatory (LOOCV) model assessment is a gold standard for computing predictive p-value that can flag such outliers. However, actual LOOCV is time-consuming because one needs to re-simulate a Markov chain for each posterior distribution in which an observation is held out as a test case. This paper introduces a new method, called iIS, for approximating LOOCV with only Markov chain samples simulated from a posterior based on a full data set. iIS is based on importance sampling (IS). iIS integrates the p-value and the likelihood of the test observation with respect to the distribution of the latent variable without reference to the actual observation. The predictive p-values computed with iIS can be proved to be equivalent to the LOOCV predictive p-values, following the general theory for IS. We com- pare iIS and other three existing methods in the literature with a lip cancer dataset collected in Scotland. Our empirical results show that iIS provides predictive p-values that are al- most identical to the actual LOOCV predictive p-values and outperforms the existing three methods, including the recently proposed ghosting method by Marshall and Spiegelhalter (2007).Comment: 21 page

arXiv.org e-Print Archive

Bayesian comparison of latent variable models: Conditional vs marginal likelihoods

Author: Furr D.
Merkle E. C.
Rabe-Hesketh S.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 25/07/2019
Field of study

Typical Bayesian methods for models with latent variables (or random effects) involve directly sampling the latent variables along with the model parameters. In high-level software code for model definitions (using, e.g., BUGS, JAGS, Stan), the likelihood is therefore specified as conditional on the latent variables. This can lead researchers to perform model comparisons via conditional likelihoods, where the latent variables are considered model parameters. In other settings, however, typical model comparisons involve marginal likelihoods where the latent variables are integrated out. This distinction is often overlooked despite the fact that it can have a large impact on the comparisons of interest. In this paper, we clarify and illustrate these issues, focusing on the comparison of conditional and marginal Deviance Information Criteria (DICs) and Watanabe-Akaike Information Criteria (WAICs) in psychometric modeling. The conditional/marginal distinction corresponds to whether the model should be predictive for the clusters that are in the data or for new clusters (where "clusters" typically correspond to higher-level units like people or schools). Correspondingly, we show that marginal WAIC corresponds to leave-one-cluster out (LOcO) cross-validation, whereas conditional WAIC corresponds to leave-one-unit out (LOuO). These results lead to recommendations on the general application of the criteria to models with latent variables.Comment: Manuscript in press at Psychometrika; 31 pages, 8 figure

arXiv.org e-Print Archive

Recommended from our members

Scoring Model Predictions using Cross-Validation

Author: Gelman Andrew
Smith Anna L.
Zheng Tian
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2018
Field of study

We formalize a framework for quantitatively assessing agreement between two datasets that are assumed to come from two distinct data generating mechanisms. We propose a methodology for prediction scoring which provides a measure of the distance between two unobserved data generating mechanisms (DGMs), along the dimension of a particular model. The cross-validated scores can be used to evaluate preregistered hypotheses and to perform model validation in the face of complex statistical models. Using human behavior data from the Next Generation Social Science (NGS2) program, we demonstrate that prediction scores can be used as model assessment tools and that they can reveal insights based on data collected from different populations and across different settings. Our proposed cross-validated prediction scores are capable of quantifying true differences between data generating mechanisms, allow for the validation and assessment of complex models, and serve as valuable tools for reproducible research

Columbia University Academic Commons

Improving the identification of antigenic sites in the H1N1 Influenza virus through accounting for the experimental structure in a sparse hierarchical Bayesian model

Author: Davies Vinny
Harvey William T.
Husmeier Dirk
Reeve Richard
Publication venue: 'Wiley'
Publication date: 01/08/2019
Field of study

Understanding how genetic changes allow emerging virus strains to escape the protection afforded by vaccination is vital for the maintenance of effective vaccines. We use structural and phylogenetic differences between pairs of virus strains to identify important antigenic sites on the surface of the influenza A(H1N1) virus through the prediction of haemagglutination inhibition (HI) titre: pairwise measures of the antigenic similarity of virus strains. We propose a sparse hierarchical Bayesian model that can deal with the pairwise structure and inherent experimental variability in the H1N1 data through the introduction of latent variables. The latent variables represent the underlying HI titre measurement of any given pair of virus strains and help to account for the fact that, for any HI titre measurement between the same pair of virus strains, the difference in the viral sequence remains the same. Through accurately representing the structure of the H1N1 data, the model can select virus sites which are antigenic, while its latent structure achieves the computational efficiency that is required to deal with large virus sequence data, as typically available for the influenza virus. In addition to the latent variable model, we also propose a new method, the block‐integrated widely applicable information criterion biWAIC, for selecting between competing models. We show how this enables us to select the random effects effectively when used with the model proposed and we apply both methods to an A(H1N1) data set

Enlighten

Spatio temporal modeling of species distribution

Author: Rodríguez de Rivera Ortega Oscar
Publication venue
Publication date: 01/01/2019
Field of study

The aim of this thesis is study spatial distribution of different groups from different perspectives and to analyse the different approaches to this problem. We move away from the classical approach, commonly used by ecologists, to more complex solutions, already applied in several disciplines. We are focused in applying advanced modelling techniques in order to understand species distribution and species behaviour and the relationships between them and environmental factors and have used first the most common models applied in ecology to move then to more advanced and complex perspectives. From a general perspective and comparing the different models applied during the process, from MaxEnt to spatio-temporal models with INLA, we can affirm that the models that we have developed show better results that the already built. Also, it is difficult to compare between the different approaches, but the Bayesian approach shows more flexibility and also the inclusion of spatial field or the latent spatio-temporal process allows to include residuals as a proxy for unmeasured variables. Compared with additive models with thin plate splines, probably considered one of the greatest methods to analyse species distribution models working with presence-absence data, comparable to MaxEnt, CART and MARS, our results show a better fit and more flexibility in the design. As a natural process we have realised that the Bayesian approach could be a better solution or at least a different approach for consideration. The main advantage of the Bayesian model formulation is the computational ease in model fit and prediction compared to classical geostatistical methods. To do so, instead of MCMC we have used the novel integrated nested Laplace approximation approach through the Stochastic Partial Differential Equation (SPDE) approach. The SPDE approach can be easily implemented providing results in reasonable computing time (comparing with MCMC). We showed how SPDE is a useful tool in the analysis of species distribution. This modelling could be expanded to the spatio-temporal domain by incorporating an extra term for the temporal effect, using parametric or semiparametric constructions to reflect linear, nonlinear, autoregressive or more complex behaviours. We can conclude that spatial and spatio-temporal Bayesian models are a really interesting approach for the understanding of environmental dynamics, not only because of the possibility to develop and solve more complex problems but also for the easy understanding of the implementation processes.The aim of this thesis is study spatial distribution of different groups from different perspectives and to analyse the different approaches to this problem. We move away from the classical approach, commonly used by ecologists, to more complex solutions, already applied in several disciplines. We are focused in applying advanced modelling techniques in order to understand species distribution and species behaviour and the relationships between them and environmental factors and have used first the most common models applied in ecology to move then to more advanced and complex perspectives. From a general perspective and comparing the different models applied during the process, from MaxEnt to spatio-temporal models with INLA, we can affirm that the models that we have developed show better results that the already built. Also, it is difficult to compare between the different approaches, but the Bayesian approach shows more flexibility and also the inclusion of spatial field or the latent spatio-temporal process allows to include residuals as a proxy for unmeasured variables. Compared with additive models with thin plate splines, probably considered one of the greatest methods to analyse species distribution models working with presence-absence data, comparable to MaxEnt, CART and MARS, our results show a better fit and more flexibility in the design. As a natural process we have realised that the Bayesian approach could be a better solution or at least a different approach for consideration. The main advantage of the Bayesian model formulation is the computational ease in model fit and prediction compared to classical geostatistical methods. To do so, instead of MCMC we have used the novel integrated nested Laplace approximation approach through the Stochastic Partial Differential Equation (SPDE) approach. The SPDE approach can be easily implemented providing results in reasonable computing time (comparing with MCMC). We showed how SPDE is a useful tool in the analysis of species distribution. This modelling could be expanded to the spatio-temporal domain by incorporating an extra term for the temporal effect, using parametric or semiparametric constructions to reflect linear, nonlinear, autoregressive or more complex behaviours. We can conclude that spatial and spatio-temporal Bayesian models are a really interesting approach for the understanding of environmental dynamics, not only because of the possibility to develop and solve more complex problems but also for the easy understanding of the implementation processes

Repositori d'Objectes Digitals per a l'Ensenyament la Recerca i la Cultura