27,418 research outputs found
Approximate Bayesian Model Selection with the Deviance Statistic
Bayesian model selection poses two main challenges: the specification of
parameter priors for all models, and the computation of the resulting Bayes
factors between models. There is now a large literature on automatic and
objective parameter priors in the linear model. One important class are
-priors, which were recently extended from linear to generalized linear
models (GLMs). We show that the resulting Bayes factors can be approximated by
test-based Bayes factors (Johnson [Scand. J. Stat. 35 (2008) 354-368]) using
the deviance statistics of the models. To estimate the hyperparameter , we
propose empirical and fully Bayes approaches and link the former to minimum
Bayes factors and shrinkage estimates from the literature. Furthermore, we
describe how to approximate the corresponding posterior distribution of the
regression coefficients based on the standard GLM output. We illustrate the
approach with the development of a clinical prediction model for 30-day
survival in the GUSTO-I trial using logistic regression.Comment: Published at http://dx.doi.org/10.1214/14-STS510 in the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Consistency of Bayesian procedures for variable selection
It has long been known that for the comparison of pairwise nested models, a
decision based on the Bayes factor produces a consistent model selector (in the
frequentist sense). Here we go beyond the usual consistency for nested pairwise
models, and show that for a wide class of prior distributions, including
intrinsic priors, the corresponding Bayesian procedure for variable selection
in normal regression is consistent in the entire class of normal linear models.
We find that the asymptotics of the Bayes factors for intrinsic priors are
equivalent to those of the Schwarz (BIC) criterion. Also, recall that the
Jeffreys--Lindley paradox refers to the well-known fact that a point null
hypothesis on the normal mean parameter is always accepted when the variance of
the conjugate prior goes to infinity. This implies that some limiting forms of
proper prior distributions are not necessarily suitable for testing problems.
Intrinsic priors are limits of proper prior distributions, and for finite
sample sizes they have been proved to behave extremely well for variable
selection in regression; a consequence of our results is that for intrinsic
priors Lindley's paradox does not arise.Comment: Published in at http://dx.doi.org/10.1214/08-AOS606 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Prior distributions for objective Bayesian analysis
We provide a review of prior distributions for objective Bayesian analysis. We start by examining some foundational issues and then organize our exposition into priors for: i) estimation or prediction; ii) model selection; iii) highdimensional models. With regard to i), we present some basic notions, and then move to more recent contributions on discrete parameter space, hierarchical models, nonparametric models, and penalizing complexity priors. Point ii) is the focus of this paper: it discusses principles for objective Bayesian model comparison, and singles out some major concepts for building priors, which are subsequently illustrated in some detail for the classic problem of variable selection in normal linear models. We also present some recent contributions in the area of objective priors on model space.With regard to point iii) we only provide a short summary of some default priors for high-dimensional models, a rapidly growing area of research
Power-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models
In the context of the expected-posterior prior (EPP) approach to Bayesian
variable selection in linear models, we combine ideas from power-prior and
unit-information-prior methodologies to simultaneously produce a
minimally-informative prior and diminish the effect of training samples. The
result is that in practice our power-expected-posterior (PEP) methodology is
sufficiently insensitive to the size n* of the training sample, due to PEP's
unit-information construction, that one may take n* equal to the full-data
sample size n and dispense with training samples altogether. In this paper we
focus on Gaussian linear models and develop our method under two different
baseline prior choices: the independence Jeffreys (or reference) prior,
yielding the J-PEP posterior, and the Zellner g-prior, leading to Z-PEP. We
find that, under the reference baseline prior, the asymptotics of PEP Bayes
factors are equivalent to those of Schwartz's BIC criterion, ensuring
consistency of the PEP approach to model selection. We compare the performance
of our method, in simulation studies and a real example involving prediction of
air-pollutant concentrations from meteorological covariates, with that of a
variety of previously-defined variants on Bayes factors for objective variable
selection. Our prior, due to its unit-information structure, leads to a
variable-selection procedure that (1) is systematically more parsimonious than
the basic EPP with minimal training sample, while sacrificing no desirable
performance characteristics to achieve this parsimony; (2) is robust to the
size of the training sample, thus enjoying the advantages described above
arising from the avoidance of training samples altogether; and (3) identifies
maximum-a-posteriori models that achieve good out-of-sample predictive
performance
Consistency of objective Bayes factors as the model dimension grows
In the class of normal regression models with a finite number of regressors,
and for a wide class of prior distributions, a Bayesian model selection
procedure based on the Bayes factor is consistent [Casella and Moreno J. Amer.
Statist. Assoc. 104 (2009) 1261--1271]. However, in models where the number of
parameters increases as the sample size increases, properties of the Bayes
factor are not totally understood. Here we study consistency of the Bayes
factors for nested normal linear models when the number of regressors increases
with the sample size. We pay attention to two successful tools for model
selection [Schwarz Ann. Statist. 6 (1978) 461--464] approximation to the Bayes
factor, and the Bayes factor for intrinsic priors [Berger and Pericchi J. Amer.
Statist. Assoc. 91 (1996) 109--122, Moreno, Bertolino and Racugno J. Amer.
Statist. Assoc. 93 (1998) 1451--1460]. We find that the the Schwarz
approximation and the Bayes factor for intrinsic priors are consistent when the
rate of growth of the dimension of the bigger model is for . When
the Schwarz approximation is always inconsistent under the alternative
while the Bayes factor for intrinsic priors is consistent except for a small
set of alternative models which is characterized.Comment: Published in at http://dx.doi.org/10.1214/09-AOS754 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Bayesian Model Selection in Complex Linear Systems, as Illustrated in Genetic Association Studies
Motivated by examples from genetic association studies, this paper considers
the model selection problem in a general complex linear model system and in a
Bayesian framework. We discuss formulating model selection problems and
incorporating context-dependent {\it a priori} information through different
levels of prior specifications. We also derive analytic Bayes factors and their
approximations to facilitate model selection and discuss their theoretical and
computational properties. We demonstrate our Bayesian approach based on an
implemented Markov Chain Monte Carlo (MCMC) algorithm in simulations and a real
data application of mapping tissue-specific eQTLs. Our novel results on Bayes
factors provide a general framework to perform efficient model comparisons in
complex linear model systems
Bayesian Model Comparison in Genetic Association Analysis: Linear Mixed Modeling and SNP Set Testing
We consider the problems of hypothesis testing and model comparison under a
flexible Bayesian linear regression model whose formulation is closely
connected with the linear mixed effect model and the parametric models for SNP
set analysis in genetic association studies. We derive a class of analytic
approximate Bayes factors and illustrate their connections with a variety of
frequentist test statistics, including the Wald statistic and the variance
component score statistic. Taking advantage of Bayesian model averaging and
hierarchical modeling, we demonstrate some distinct advantages and
flexibilities in the approaches utilizing the derived Bayes factors in the
context of genetic association studies. We demonstrate our proposed methods
using real or simulated numerical examples in applications of single SNP
association testing, multi-locus fine-mapping and SNP set association testing
- …