1,689 research outputs found
Bayesian correction for covariate measurement error: a frequentist evaluation and comparison with regression calibration
Bayesian approaches for handling covariate measurement error are well
established, and yet arguably are still relatively little used by researchers.
For some this is likely due to unfamiliarity or disagreement with the Bayesian
inferential paradigm. For others a contributory factor is the inability of
standard statistical packages to perform such Bayesian analyses. In this paper
we first give an overview of the Bayesian approach to handling covariate
measurement error, and contrast it with regression calibration (RC), arguably
the most commonly adopted approach. We then argue why the Bayesian approach has
a number of statistical advantages compared to RC, and demonstrate that
implementing the Bayesian approach is usually quite feasible for the analyst.
Next we describe the closely related maximum likelihood and multiple imputation
approaches, and explain why we believe the Bayesian approach to generally be
preferable. We then empirically compare the frequentist properties of RC and
the Bayesian approach through simulation studies. The flexibility of the
Bayesian approach to handle both measurement error and missing data is then
illustrated through an analysis of data from the Third National Health and
Nutrition Examination Survey
Bayesian semiparametric analysis for two-phase studies of gene-environment interaction
The two-phase sampling design is a cost-efficient way of collecting expensive
covariate information on a judiciously selected subsample. It is natural to
apply such a strategy for collecting genetic data in a subsample enriched for
exposure to environmental factors for gene-environment interaction (G x E)
analysis. In this paper, we consider two-phase studies of G x E interaction
where phase I data are available on exposure, covariates and disease status.
Stratified sampling is done to prioritize individuals for genotyping at phase
II conditional on disease and exposure. We consider a Bayesian analysis based
on the joint retrospective likelihood of phases I and II data. We address
several important statistical issues: (i) we consider a model with multiple
genes, environmental factors and their pairwise interactions. We employ a
Bayesian variable selection algorithm to reduce the dimensionality of this
potentially high-dimensional model; (ii) we use the assumption of gene-gene and
gene-environment independence to trade off between bias and efficiency for
estimating the interaction parameters through use of hierarchical priors
reflecting this assumption; (iii) we posit a flexible model for the joint
distribution of the phase I categorical variables using the nonparametric Bayes
construction of Dunson and Xing [J. Amer. Statist. Assoc. 104 (2009)
1042-1051].Comment: Published in at http://dx.doi.org/10.1214/12-AOAS599 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Bayesian modeling longitudinal dyadic data with nonignorable dropout, with application to a breast cancer study
Dyadic data are common in the social and behavioral sciences, in which
members of dyads are correlated due to the interdependence structure within
dyads. The analysis of longitudinal dyadic data becomes complex when
nonignorable dropouts occur. We propose a fully Bayesian selection-model-based
approach to analyze longitudinal dyadic data with nonignorable dropouts. We
model repeated measures on subjects by a transition model and account for
within-dyad correlations by random effects. In the model, we allow subject's
outcome to depend on his/her own characteristics and measure history, as well
as those of the other member in the dyad. We further account for the
nonignorable missing data mechanism using a selection model in which the
probability of dropout depends on the missing outcome. We propose a Gibbs
sampler algorithm to fit the model. Simulation studies show that the proposed
method effectively addresses the problem of nonignorable dropouts. We
illustrate our methodology using a longitudinal breast cancer study.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS515 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Non-compliance and missing data in health economic evaluation
Health economic evaluations face the issues of non-compliance and missing
data. Here, non-compliance is defined as non-adherence to a specific treatment,
and occurs within randomised controlled trials (RCTs) when participants depart
from their random assignment. Missing data arises if, for example, there is
loss to follow-up, survey non-response, or the information available from
routine data sources is incomplete. Appropriate statistical methods for
handling non-compliance and missing data have been developed, but they have
rarely been applied in health economics studies. Here, we illustrate the issues
and outline some of the appropriate methods to handle these with an application
to a health economic evaluation that uses data from an RCT.
In an RCT the random assignment can be used as an instrument for treatment
receipt, to obtain consistent estimates of the complier average causal effect,
provided the underlying assumptions are met. Instrumental variable methods can
accommodate essential features of the health economic context such as the
correlation between individuals' costs and outcomes in cost-effectiveness
studies. Methodological guidance for handling missing data encourages
approaches such as multiple imputation or inverse probability weighting, that
assume the data are Missing At Random, but also sensitivity analyses that
recognise the data may be missing according to the true, unobserved values,
that is, Missing Not at Random.
Future studies should subject the assumptions behind methods for handling
non-compliance and missing data to thorough sensitivity analyses. Modern
machine learning methods can help reduce reliance on correct model
specification. Further research is required to develop flexible methods for
handling more complex forms of non-compliance and missing data.Comment: 41 page
Objective Bayes and Conditional Frequentist Inference
Objective Bayesian methods have garnered considerable interest and support among statisticians,
particularly over the past two decades. It has often been ignored, however, that in
some cases the appropriate frequentist inference to match is a conditional one. We present
various methods for extending the probability matching prior (PMP) methods to conditional
settings. A method based on saddlepoint approximations is found to be the most
tractable and we demonstrate its use in the most common exact ancillary statistic models.
As part of this analysis, we give a proof of an exactness property of a particular PMP in
location-scale models. We use the proposed matching methods to investigate the relationships
between conditional and unconditional PMPs. A key component of our analysis is a
numerical study of the performance of probability matching priors from both a conditional
and unconditional perspective in exact ancillary models. In concluding remarks we propose
many routes for future research
Quantifying the Fraction of Missing Information for Hypothesis Testing in Statistical and Genetic Studies
Many practical studies rely on hypothesis testing procedures applied to data
sets with missing information. An important part of the analysis is to
determine the impact of the missing data on the performance of the test, and
this can be done by properly quantifying the relative (to complete data) amount
of available information. The problem is directly motivated by applications to
studies, such as linkage analyses and haplotype-based association projects,
designed to identify genetic contributions to complex diseases. In the genetic
studies the relative information measures are needed for the experimental
design, technology comparison, interpretation of the data, and for
understanding the behavior of some of the inference tools. The central
difficulties in constructing such information measures arise from the multiple,
and sometimes conflicting, aims in practice. For large samples, we show that a
satisfactory, likelihood-based general solution exists by using appropriate
forms of the relative Kullback--Leibler information, and that the proposed
measures are computationally inexpensive given the maximized likelihoods with
the observed data. Two measures are introduced, under the null and alternative
hypothesis respectively. We exemplify the measures on data coming from mapping
studies on the inflammatory bowel disease and diabetes. For small-sample
problems, which appear rather frequently in practice and sometimes in disguised
forms (e.g., measuring individual contributions to a large study), the robust
Bayesian approach holds great promise, though the choice of a general-purpose
"default prior" is a very challenging problem.Comment: Published in at http://dx.doi.org/10.1214/07-STS244 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
A dynamic nonstationary spatio-temporal model for short term prediction of precipitation
Precipitation is a complex physical process that varies in space and time.
Predictions and interpolations at unobserved times and/or locations help to
solve important problems in many areas. In this paper, we present a
hierarchical Bayesian model for spatio-temporal data and apply it to obtain
short term predictions of rainfall. The model incorporates physical knowledge
about the underlying processes that determine rainfall, such as advection,
diffusion and convection. It is based on a temporal autoregressive convolution
with spatially colored and temporally white innovations. By linking the
advection parameter of the convolution kernel to an external wind vector, the
model is temporally nonstationary. Further, it allows for nonseparable and
anisotropic covariance structures. With the help of the Voronoi tessellation,
we construct a natural parametrization, that is, space as well as time
resolution consistent, for data lying on irregular grid points. In the
application, the statistical model combines forecasts of three other
meteorological variables obtained from a numerical weather prediction model
with past precipitation observations. The model is then used to predict
three-hourly precipitation over 24 hours. It performs better than a separable,
stationary and isotropic version, and it performs comparably to a deterministic
numerical weather prediction model for precipitation and has the advantage that
it quantifies prediction uncertainty.Comment: Published in at http://dx.doi.org/10.1214/12-AOAS564 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …