92,020 research outputs found
Robust and efficient projection predictive inference
The concepts of Bayesian prediction, model comparison, and model selection
have developed significantly over the last decade. As a result, the Bayesian
community has witnessed a rapid growth in theoretical and applied contributions
to building and selecting predictive models. Projection predictive inference in
particular has shown promise to this end, finding application across a broad
range of fields. It is less prone to over-fitting than na\"ive selection based
purely on cross-validation or information criteria performance metrics, and has
been known to out-perform other methods in terms of predictive performance. We
survey the core concept and contemporary contributions to projection predictive
inference, and present a safe, efficient, and modular workflow for
prediction-oriented model selection therein. We also provide an interpretation
of the projected posteriors achieved by projection predictive inference in
terms of their limitations in causal settings
The problem of model selection and scientific realism
This thesis has two goals. Firstly, we consider the problem of model selection for the
purposes of prediction. In modern science predictive mathematical models are
ubiquitous and can be found in such diverse fields as weather forecasting,
economics, ecology, mathematical psychology, sociology, etc. It is often the case
that for a given domain of inquiry there are several plausible models, and the issue
then is how to discriminate between them – this is the problem of model selection.
We consider approaches to model selection that are used in classical [also known as
frequentist] statistics, and fashionable in recent years methods of Akaike Information
Criterion [AIC] and Bayes Information Criterion [BIC], the latter being a part of a
broader Bayesian approach. We show the connection between AIC and BIC, and
provide comparison of performance of these methods.
Secondly, we consider some philosophical arguments that arise within the setting of
the model selection approaches investigated in the first part. These arguments aim to
provide counterexamples to the epistemic thesis of scientific realism, viz., that
predictively successful scientific theories are approximately true, and to the idea that
truth and predictive accuracy go together.
We argue for the following claims: 1) that none of the criticisms brought forward in
the philosophical literature against the AIC methodology are devastating, and AIC
remains a viable method of model selection; 2) that the BIC methodology likewise
survives the numerous criticisms; 3) that the counterexamples to scientific realism
that ostensibly arise within the framework of model selection are flawed; 4) that in
general the model selection methods discussed in this thesis are neutral with regards
to the issue of scientific realism; 5) that a plurality of methodologies should be
applied to the problem of model selection with full awareness of the foundational
issues that each of these methodologies has
A stochastic model for critical illness insurance
In this thesis, we present methods and results for the estimation of diagnosis inception
rates for Critical Illness Insurance (CII) claims in the UK by cause. This is the
first study which provides a stochastic model for the diagnosis inception rates for
CII. The data are supplied by the UK Continuous Mortality Investigation and relate
to claims settled in the years 1999 - 2005. First, we develop a model for the delay
between dates of diagnosis and settlement of claims in CII using a generalised-lineartype
model with Burr errors under both Bayesian and maximum likelihood approach.
Variable selection using Bayesian methodology to obtain the best model with different
prior distribution setups for the parameters is applied. For comparison purposes,
a lognormal model and frequency-based model selection techniques are also considered.
The non-recorded dates of diagnosis and settlement have been included in the
analysis as missing values using their posterior predictive distribution and Markov
Chain Monte Carlo methodology. Missing dates of diagnosis are estimated using the
parsimonious claim delay distribution. With this complete data set, diagnosis inception
rates for all causes (combined) and for specific causes are estimated using an
appropriate claim delay distribution where the observed numbers of claim counts are
assumed to have a Poisson distribution. To model the crude rates, a generalised linear
model with Poisson errors and log-link function is used
Bayesian Synthesis: Combining subjective analyses, with an application to ozone data
Bayesian model averaging enables one to combine the disparate predictions of
a number of models in a coherent fashion, leading to superior predictive
performance. The improvement in performance arises from averaging models that
make different predictions. In this work, we tap into perhaps the biggest
driver of different predictions---different analysts---in order to gain the
full benefits of model averaging. In a standard implementation of our method,
several data analysts work independently on portions of a data set, eliciting
separate models which are eventually updated and combined through a specific
weighting method. We call this modeling procedure Bayesian Synthesis. The
methodology helps to alleviate concerns about the sizable gap between the
foundational underpinnings of the Bayesian paradigm and the practice of
Bayesian statistics. In experimental work we show that human modeling has
predictive performance superior to that of many automatic modeling techniques,
including AIC, BIC, Smoothing Splines, CART, Bagged CART, Bayes CART, BMA and
LARS, and only slightly inferior to that of BART. We also show that Bayesian
Synthesis further improves predictive performance. Additionally, we examine the
predictive performance of a simple average across analysts, which we dub Convex
Synthesis, and find that it also produces an improvement.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS444 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Bayesian comparison of latent variable models: Conditional vs marginal likelihoods
Typical Bayesian methods for models with latent variables (or random effects)
involve directly sampling the latent variables along with the model parameters.
In high-level software code for model definitions (using, e.g., BUGS, JAGS,
Stan), the likelihood is therefore specified as conditional on the latent
variables. This can lead researchers to perform model comparisons via
conditional likelihoods, where the latent variables are considered model
parameters. In other settings, however, typical model comparisons involve
marginal likelihoods where the latent variables are integrated out. This
distinction is often overlooked despite the fact that it can have a large
impact on the comparisons of interest. In this paper, we clarify and illustrate
these issues, focusing on the comparison of conditional and marginal Deviance
Information Criteria (DICs) and Watanabe-Akaike Information Criteria (WAICs) in
psychometric modeling. The conditional/marginal distinction corresponds to
whether the model should be predictive for the clusters that are in the data or
for new clusters (where "clusters" typically correspond to higher-level units
like people or schools). Correspondingly, we show that marginal WAIC
corresponds to leave-one-cluster out (LOcO) cross-validation, whereas
conditional WAIC corresponds to leave-one-unit out (LOuO). These results lead
to recommendations on the general application of the criteria to models with
latent variables.Comment: Manuscript in press at Psychometrika; 31 pages, 8 figure
What do Bayesian methods offer population forecasters?
The Bayesian approach has a number of attractive properties for probabilistic forecasting. In this paper, we apply Bayesian time series models to obtain future population estimates with uncertainty for England and Wales. To account for heterogeneity found in the historical data, we add parameters to represent the stochastic volatility in the error terms. Uncertainty in model choice is incorporated through Bayesian model averaging techniques. The resulting predictive distributions from Bayesian forecasting models have two main advantages over those obtained using traditional stochastic models. Firstly, data and uncertainties in the parameters and model choice are explicitly included using probability distributions. As a result, more realistic probabilistic population forecasts can be obtained. Second, Bayesian models formally allow the incorporation of expert opinion, including uncertainty, into the forecast. Our results are discussed in relation to classical time series methods and existing cohort component projections. This paper demonstrates the flexibility of the Bayesian approach to simple population forecasting and provides insights into further developments of more complicated population models that include, for example, components of demographic change
- …