1,689 research outputs found

    Bayesian correction for covariate measurement error: a frequentist evaluation and comparison with regression calibration

    Get PDF
    Bayesian approaches for handling covariate measurement error are well established, and yet arguably are still relatively little used by researchers. For some this is likely due to unfamiliarity or disagreement with the Bayesian inferential paradigm. For others a contributory factor is the inability of standard statistical packages to perform such Bayesian analyses. In this paper we first give an overview of the Bayesian approach to handling covariate measurement error, and contrast it with regression calibration (RC), arguably the most commonly adopted approach. We then argue why the Bayesian approach has a number of statistical advantages compared to RC, and demonstrate that implementing the Bayesian approach is usually quite feasible for the analyst. Next we describe the closely related maximum likelihood and multiple imputation approaches, and explain why we believe the Bayesian approach to generally be preferable. We then empirically compare the frequentist properties of RC and the Bayesian approach through simulation studies. The flexibility of the Bayesian approach to handle both measurement error and missing data is then illustrated through an analysis of data from the Third National Health and Nutrition Examination Survey

    Bayesian semiparametric analysis for two-phase studies of gene-environment interaction

    Full text link
    The two-phase sampling design is a cost-efficient way of collecting expensive covariate information on a judiciously selected subsample. It is natural to apply such a strategy for collecting genetic data in a subsample enriched for exposure to environmental factors for gene-environment interaction (G x E) analysis. In this paper, we consider two-phase studies of G x E interaction where phase I data are available on exposure, covariates and disease status. Stratified sampling is done to prioritize individuals for genotyping at phase II conditional on disease and exposure. We consider a Bayesian analysis based on the joint retrospective likelihood of phases I and II data. We address several important statistical issues: (i) we consider a model with multiple genes, environmental factors and their pairwise interactions. We employ a Bayesian variable selection algorithm to reduce the dimensionality of this potentially high-dimensional model; (ii) we use the assumption of gene-gene and gene-environment independence to trade off between bias and efficiency for estimating the interaction parameters through use of hierarchical priors reflecting this assumption; (iii) we posit a flexible model for the joint distribution of the phase I categorical variables using the nonparametric Bayes construction of Dunson and Xing [J. Amer. Statist. Assoc. 104 (2009) 1042-1051].Comment: Published in at http://dx.doi.org/10.1214/12-AOAS599 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Bayesian modeling longitudinal dyadic data with nonignorable dropout, with application to a breast cancer study

    Full text link
    Dyadic data are common in the social and behavioral sciences, in which members of dyads are correlated due to the interdependence structure within dyads. The analysis of longitudinal dyadic data becomes complex when nonignorable dropouts occur. We propose a fully Bayesian selection-model-based approach to analyze longitudinal dyadic data with nonignorable dropouts. We model repeated measures on subjects by a transition model and account for within-dyad correlations by random effects. In the model, we allow subject's outcome to depend on his/her own characteristics and measure history, as well as those of the other member in the dyad. We further account for the nonignorable missing data mechanism using a selection model in which the probability of dropout depends on the missing outcome. We propose a Gibbs sampler algorithm to fit the model. Simulation studies show that the proposed method effectively addresses the problem of nonignorable dropouts. We illustrate our methodology using a longitudinal breast cancer study.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS515 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Non-compliance and missing data in health economic evaluation

    Full text link
    Health economic evaluations face the issues of non-compliance and missing data. Here, non-compliance is defined as non-adherence to a specific treatment, and occurs within randomised controlled trials (RCTs) when participants depart from their random assignment. Missing data arises if, for example, there is loss to follow-up, survey non-response, or the information available from routine data sources is incomplete. Appropriate statistical methods for handling non-compliance and missing data have been developed, but they have rarely been applied in health economics studies. Here, we illustrate the issues and outline some of the appropriate methods to handle these with an application to a health economic evaluation that uses data from an RCT. In an RCT the random assignment can be used as an instrument for treatment receipt, to obtain consistent estimates of the complier average causal effect, provided the underlying assumptions are met. Instrumental variable methods can accommodate essential features of the health economic context such as the correlation between individuals' costs and outcomes in cost-effectiveness studies. Methodological guidance for handling missing data encourages approaches such as multiple imputation or inverse probability weighting, that assume the data are Missing At Random, but also sensitivity analyses that recognise the data may be missing according to the true, unobserved values, that is, Missing Not at Random. Future studies should subject the assumptions behind methods for handling non-compliance and missing data to thorough sensitivity analyses. Modern machine learning methods can help reduce reliance on correct model specification. Further research is required to develop flexible methods for handling more complex forms of non-compliance and missing data.Comment: 41 page

    Objective Bayes and Conditional Frequentist Inference

    Get PDF
    Objective Bayesian methods have garnered considerable interest and support among statisticians, particularly over the past two decades. It has often been ignored, however, that in some cases the appropriate frequentist inference to match is a conditional one. We present various methods for extending the probability matching prior (PMP) methods to conditional settings. A method based on saddlepoint approximations is found to be the most tractable and we demonstrate its use in the most common exact ancillary statistic models. As part of this analysis, we give a proof of an exactness property of a particular PMP in location-scale models. We use the proposed matching methods to investigate the relationships between conditional and unconditional PMPs. A key component of our analysis is a numerical study of the performance of probability matching priors from both a conditional and unconditional perspective in exact ancillary models. In concluding remarks we propose many routes for future research

    Quantifying the Fraction of Missing Information for Hypothesis Testing in Statistical and Genetic Studies

    Get PDF
    Many practical studies rely on hypothesis testing procedures applied to data sets with missing information. An important part of the analysis is to determine the impact of the missing data on the performance of the test, and this can be done by properly quantifying the relative (to complete data) amount of available information. The problem is directly motivated by applications to studies, such as linkage analyses and haplotype-based association projects, designed to identify genetic contributions to complex diseases. In the genetic studies the relative information measures are needed for the experimental design, technology comparison, interpretation of the data, and for understanding the behavior of some of the inference tools. The central difficulties in constructing such information measures arise from the multiple, and sometimes conflicting, aims in practice. For large samples, we show that a satisfactory, likelihood-based general solution exists by using appropriate forms of the relative Kullback--Leibler information, and that the proposed measures are computationally inexpensive given the maximized likelihoods with the observed data. Two measures are introduced, under the null and alternative hypothesis respectively. We exemplify the measures on data coming from mapping studies on the inflammatory bowel disease and diabetes. For small-sample problems, which appear rather frequently in practice and sometimes in disguised forms (e.g., measuring individual contributions to a large study), the robust Bayesian approach holds great promise, though the choice of a general-purpose "default prior" is a very challenging problem.Comment: Published in at http://dx.doi.org/10.1214/07-STS244 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A dynamic nonstationary spatio-temporal model for short term prediction of precipitation

    Full text link
    Precipitation is a complex physical process that varies in space and time. Predictions and interpolations at unobserved times and/or locations help to solve important problems in many areas. In this paper, we present a hierarchical Bayesian model for spatio-temporal data and apply it to obtain short term predictions of rainfall. The model incorporates physical knowledge about the underlying processes that determine rainfall, such as advection, diffusion and convection. It is based on a temporal autoregressive convolution with spatially colored and temporally white innovations. By linking the advection parameter of the convolution kernel to an external wind vector, the model is temporally nonstationary. Further, it allows for nonseparable and anisotropic covariance structures. With the help of the Voronoi tessellation, we construct a natural parametrization, that is, space as well as time resolution consistent, for data lying on irregular grid points. In the application, the statistical model combines forecasts of three other meteorological variables obtained from a numerical weather prediction model with past precipitation observations. The model is then used to predict three-hourly precipitation over 24 hours. It performs better than a separable, stationary and isotropic version, and it performs comparably to a deterministic numerical weather prediction model for precipitation and has the advantage that it quantifies prediction uncertainty.Comment: Published in at http://dx.doi.org/10.1214/12-AOAS564 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org
    • …
    corecore