92,020 research outputs found

    Robust and efficient projection predictive inference

    Full text link
    The concepts of Bayesian prediction, model comparison, and model selection have developed significantly over the last decade. As a result, the Bayesian community has witnessed a rapid growth in theoretical and applied contributions to building and selecting predictive models. Projection predictive inference in particular has shown promise to this end, finding application across a broad range of fields. It is less prone to over-fitting than na\"ive selection based purely on cross-validation or information criteria performance metrics, and has been known to out-perform other methods in terms of predictive performance. We survey the core concept and contemporary contributions to projection predictive inference, and present a safe, efficient, and modular workflow for prediction-oriented model selection therein. We also provide an interpretation of the projected posteriors achieved by projection predictive inference in terms of their limitations in causal settings

    The problem of model selection and scientific realism

    Get PDF
    This thesis has two goals. Firstly, we consider the problem of model selection for the purposes of prediction. In modern science predictive mathematical models are ubiquitous and can be found in such diverse fields as weather forecasting, economics, ecology, mathematical psychology, sociology, etc. It is often the case that for a given domain of inquiry there are several plausible models, and the issue then is how to discriminate between them – this is the problem of model selection. We consider approaches to model selection that are used in classical [also known as frequentist] statistics, and fashionable in recent years methods of Akaike Information Criterion [AIC] and Bayes Information Criterion [BIC], the latter being a part of a broader Bayesian approach. We show the connection between AIC and BIC, and provide comparison of performance of these methods. Secondly, we consider some philosophical arguments that arise within the setting of the model selection approaches investigated in the first part. These arguments aim to provide counterexamples to the epistemic thesis of scientific realism, viz., that predictively successful scientific theories are approximately true, and to the idea that truth and predictive accuracy go together. We argue for the following claims: 1) that none of the criticisms brought forward in the philosophical literature against the AIC methodology are devastating, and AIC remains a viable method of model selection; 2) that the BIC methodology likewise survives the numerous criticisms; 3) that the counterexamples to scientific realism that ostensibly arise within the framework of model selection are flawed; 4) that in general the model selection methods discussed in this thesis are neutral with regards to the issue of scientific realism; 5) that a plurality of methodologies should be applied to the problem of model selection with full awareness of the foundational issues that each of these methodologies has

    A stochastic model for critical illness insurance

    Get PDF
    In this thesis, we present methods and results for the estimation of diagnosis inception rates for Critical Illness Insurance (CII) claims in the UK by cause. This is the first study which provides a stochastic model for the diagnosis inception rates for CII. The data are supplied by the UK Continuous Mortality Investigation and relate to claims settled in the years 1999 - 2005. First, we develop a model for the delay between dates of diagnosis and settlement of claims in CII using a generalised-lineartype model with Burr errors under both Bayesian and maximum likelihood approach. Variable selection using Bayesian methodology to obtain the best model with different prior distribution setups for the parameters is applied. For comparison purposes, a lognormal model and frequency-based model selection techniques are also considered. The non-recorded dates of diagnosis and settlement have been included in the analysis as missing values using their posterior predictive distribution and Markov Chain Monte Carlo methodology. Missing dates of diagnosis are estimated using the parsimonious claim delay distribution. With this complete data set, diagnosis inception rates for all causes (combined) and for specific causes are estimated using an appropriate claim delay distribution where the observed numbers of claim counts are assumed to have a Poisson distribution. To model the crude rates, a generalised linear model with Poisson errors and log-link function is used

    Bayesian Synthesis: Combining subjective analyses, with an application to ozone data

    Full text link
    Bayesian model averaging enables one to combine the disparate predictions of a number of models in a coherent fashion, leading to superior predictive performance. The improvement in performance arises from averaging models that make different predictions. In this work, we tap into perhaps the biggest driver of different predictions---different analysts---in order to gain the full benefits of model averaging. In a standard implementation of our method, several data analysts work independently on portions of a data set, eliciting separate models which are eventually updated and combined through a specific weighting method. We call this modeling procedure Bayesian Synthesis. The methodology helps to alleviate concerns about the sizable gap between the foundational underpinnings of the Bayesian paradigm and the practice of Bayesian statistics. In experimental work we show that human modeling has predictive performance superior to that of many automatic modeling techniques, including AIC, BIC, Smoothing Splines, CART, Bagged CART, Bayes CART, BMA and LARS, and only slightly inferior to that of BART. We also show that Bayesian Synthesis further improves predictive performance. Additionally, we examine the predictive performance of a simple average across analysts, which we dub Convex Synthesis, and find that it also produces an improvement.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS444 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Bayesian comparison of latent variable models: Conditional vs marginal likelihoods

    Full text link
    Typical Bayesian methods for models with latent variables (or random effects) involve directly sampling the latent variables along with the model parameters. In high-level software code for model definitions (using, e.g., BUGS, JAGS, Stan), the likelihood is therefore specified as conditional on the latent variables. This can lead researchers to perform model comparisons via conditional likelihoods, where the latent variables are considered model parameters. In other settings, however, typical model comparisons involve marginal likelihoods where the latent variables are integrated out. This distinction is often overlooked despite the fact that it can have a large impact on the comparisons of interest. In this paper, we clarify and illustrate these issues, focusing on the comparison of conditional and marginal Deviance Information Criteria (DICs) and Watanabe-Akaike Information Criteria (WAICs) in psychometric modeling. The conditional/marginal distinction corresponds to whether the model should be predictive for the clusters that are in the data or for new clusters (where "clusters" typically correspond to higher-level units like people or schools). Correspondingly, we show that marginal WAIC corresponds to leave-one-cluster out (LOcO) cross-validation, whereas conditional WAIC corresponds to leave-one-unit out (LOuO). These results lead to recommendations on the general application of the criteria to models with latent variables.Comment: Manuscript in press at Psychometrika; 31 pages, 8 figure

    What do Bayesian methods offer population forecasters?

    No full text
    The Bayesian approach has a number of attractive properties for probabilistic forecasting. In this paper, we apply Bayesian time series models to obtain future population estimates with uncertainty for England and Wales. To account for heterogeneity found in the historical data, we add parameters to represent the stochastic volatility in the error terms. Uncertainty in model choice is incorporated through Bayesian model averaging techniques. The resulting predictive distributions from Bayesian forecasting models have two main advantages over those obtained using traditional stochastic models. Firstly, data and uncertainties in the parameters and model choice are explicitly included using probability distributions. As a result, more realistic probabilistic population forecasts can be obtained. Second, Bayesian models formally allow the incorporation of expert opinion, including uncertainty, into the forecast. Our results are discussed in relation to classical time series methods and existing cohort component projections. This paper demonstrates the flexibility of the Bayesian approach to simple population forecasting and provides insights into further developments of more complicated population models that include, for example, components of demographic change
    corecore