1,025 research outputs found

    Bayesian Inference under Cluster Sampling with Probability Proportional to Size

    Full text link
    Cluster sampling is common in survey practice, and the corresponding inference has been predominantly design-based. We develop a Bayesian framework for cluster sampling and account for the design effect in the outcome modeling. We consider a two-stage cluster sampling design where the clusters are first selected with probability proportional to cluster size, and then units are randomly sampled inside selected clusters. Challenges arise when the sizes of nonsampled cluster are unknown. We propose nonparametric and parametric Bayesian approaches for predicting the unknown cluster sizes, with this inference performed simultaneously with the model for survey outcome. Simulation studies show that the integrated Bayesian approach outperforms classical methods with efficiency gains. We use Stan for computing and apply the proposal to the Fragile Families and Child Wellbeing study as an illustration of complex survey inference in health surveys

    An Analysis of Nonignorable Nonresponse to Income in a Survey with a Rotating Panel Design

    Get PDF
    In a rotating panel survey, individuals are interviewed in some waves of the survey but are not interviewed in others. We consider the treatment of missing income data in the labor force survey of the Municipality of Florence in Italy, a survey with a rotating panel design where recipiency and amount of income are missing for waves where individuals are not interviewed, and amount of income is missing for waves where individuals are interviewed but refuse to answer the income amount question. It is thus a question of a multivariate missing data problem with two missing-data mechanisms, one by design and one by refusal, and varying sets of covariates for imputation depending on the wave of the survey. Existing methods for multivariate imputation such as sequential regression multiple imputation (SRMI) can be applied, but assume that the missing income values are missing at random (MAR). This assumption is reasonable when missing data arise from the rotating panel design, but less reasonable when the missing data arise from refusal to answer the income question, since in this case missingness of income is generally thought to be related to the value of income itself, after conditioning on available covariates. In this article we describe a sensitivity analysis to assess the impact of departures from MAR for refusals, based on SRMI for a pattern-mixture model. The sensitivity analysis avoids the well-known problems of underidentification of parameters of missing not at random models, is easy to carry out using existing sequential multiple imputation software, and takes into account the different mechanisms that lead to missing data

    Stereotyping and the treatment of missing data for drug and alcohol clinical trials

    Get PDF
    Stigma and stereotyping of marginalized groups often is insidious and shows up in unlikely places, for instance in how clinical trials consider dropouts in treatment research. A surprising number of studies presume that people who do not complete the study protocol relapse and code their data as if they had been observed. There is no good statistical rationale for this treatment of missing data and numerous and more defensible alternative methods are available. We need to be mindful about our attitudes and preconceptions about the people we are intending to help. There is no good reason to continue to support science built on this scientifically indefensible stereotyping, however unintentional

    Investigating the missing data mechanism in quality of life outcomes: a comparison of approaches

    Get PDF
    Background: Missing data is classified as missing completely at random (MCAR), missing at random (MAR) or missing not at random (MNAR). Knowing the mechanism is useful in identifying the most appropriate analysis. The first aim was to compare different methods for identifying this missing data mechanism to determine if they gave consistent conclusions. Secondly, to investigate whether the reminder-response data can be utilised to help identify the missing data mechanism. Methods: Five clinical trial datasets that employed a reminder system at follow-up were used. Some quality of life questionnaires were initially missing, but later recovered through reminders. Four methods of determining the missing data mechanism were applied. Two response data scenarios were considered. Firstly, immediate data only; secondly, all observed responses (including reminder-response). Results: In three of five trials the hypothesis tests found evidence against the MCAR assumption. Logistic regression suggested MAR, but was able to use the reminder-collected data to highlight potential MNAR data in two trials. Conclusion: The four methods were consistent in determining the missingness mechanism. One hypothesis test was preferred as it is applicable with intermittent missingness. Some inconsistencies between the two data scenarios were found. Ignoring the reminder data could potentially give a distorted view of the missingness mechanism. Utilising reminder data allowed the possibility of MNAR to be considered.The Chief Scientist Office of the Scottish Government Health Directorate. Research Training Fellowship (CZF/1/31

    Parameter estimation for load-sharing system subject to Wiener degradation process using the expectation-maximization algorithm

    Get PDF
    In practice, many systems exhibit load-sharing behavior, where the surviving components share the total load imposed on the system. Different from general systems, the components of load-sharing systems are interdependent in nature, in such a way that when one component fails, the system load has to be shared by the remaining components, which increases the failure rate or degradation rate of the remaining components. Because of the load-sharing mechanism among components, parameter estimation and reliability assessment are usually complicated for load-sharing systems. Although load-sharing systems with components subject to sudden failures have been intensely studied in literatures with detailed estimation and analysis approaches, those with components subject to degradation are rarely investigated. In this paper, we propose the parameter estimation method for load-sharing systems subject to continuous degradation with a constant load. Likelihood function based on the degradation data of components is established as a first step. The maximum likelihood estimators for unknown parameters are deduced and obtained via expectation-maximization (EM) algorithm considering the nonclosed form of the likelihood function. Numerical examples are used to illustrate the effectiveness of the proposed method

    The relative contribution of genes and environment to alcohol use in early adolescents: Are similar factors related to initiation of alcohol use and frequency of drinking?

    Get PDF
    Item does not contain fulltextBackground: The present study assessed the relative contribution of genes and environment to individual differences in initiation of alcohol use and frequency of drinking among early adolescents and examined the extent to which the same genetic and environmental factors influence both individual differences in initiation of alcohol use and frequency of drinking. Methods: Questionnaire data collected by the Netherlands Twin Register were available for 694 twin pairs aged of 12 to 15 years. Bivariate genetic model fitting analyses were conducted inmx. We modeled the variance of initiation of alcohol use and frequency of drinking as a function of three influences: genetic effects, common environmental effects, and unique environmental effects. Analyses were performed conditional on sex. Results: Findings indicated that genetic factors were most important for variation in early initiation of alcohol use (83% explained variance in males and 70% in females). There was a small contribution of common environment (2% in males, 19% in females). In contrast, common environmental factors explained most of the variation in frequency of drinking (82% in males and females). In males the association between initiation and frequency was explained by common environmental factors influencing both phenotypes. In females, there was a large contribution of common environmental factors that influenced frequency of drinking only. There was no evidence that different genetic or common environmental factors operated in males and females. Conclusion: Different factors were involved in individual differences in early initiation of alcohol use and frequency of drinking once adolescents have started to use alcohol

    Sensitivity Analysis for Not-at-Random Missing Data in Trial-Based Cost-Effectiveness Analysis : A Tutorial

    Get PDF
    Cost-effectiveness analyses (CEA) of randomised controlled trials are a key source of information for health care decision makers. Missing data are, however, a common issue that can seriously undermine their validity. A major concern is that the chance of data being missing may be directly linked to the unobserved value itself [missing not at random (MNAR)]. For example, patients with poorer health may be less likely to complete quality-of-life questionnaires. However, the extent to which this occurs cannot be ascertained from the data at hand. Guidelines recommend conducting sensitivity analyses to assess the robustness of conclusions to plausible MNAR assumptions, but this is rarely done in practice, possibly because of a lack of practical guidance. This tutorial aims to address this by presenting an accessible framework and practical guidance for conducting sensitivity analysis for MNAR data in trial-based CEA. We review some of the methods for conducting sensitivity analysis, but focus on one particularly accessible approach, where the data are multiply-imputed and then modified to reflect plausible MNAR scenarios. We illustrate the implementation of this approach on a weight-loss trial, providing the software code. We then explore further issues around its use in practice

    Seven-Year Neurodevelopmental Scores and Prenatal Exposure to Chlorpyrifos, a Common Agricultural Pesticide

    Get PDF
    Background: In a longitudinal birth cohort study of inner-city mothers and children (Columbia Center for Children’s Environmental Health), we have previously reported that prenatal exposure to chlorpyrifos (CPF) was associated with neurodevelopmental problems at 3 years of age

    Sensitivity analysis for clinical trials with missing continuous outcome data using controlled multiple imputation: a practical guide

    Get PDF
    Missing data due to loss to follow‐up or intercurrent events are unintended, but unfortunately inevitable in clinical trials. Since the true values of missing data are never known, it is necessary to assess the impact of untestable and unavoidable assumptions about any unobserved data in sensitivity analysis. This tutorial provides an overview of controlled multiple imputation (MI) techniques and a practical guide to their use for sensitivity analysis of trials with missing continuous outcome data. These include δ ‐ and reference‐based MI procedures. In δ ‐based imputation, an offset term, δ , is typically added to the expected value of the missing data to assess the impact of unobserved participants having a worse or better response than those observed. Reference‐based imputation draws imputed values with some reference to observed data in other groups of the trial, typically in other treatment arms. We illustrate the accessibility of these methods using data from a pediatric eczema trial and a chronic headache trial and provide Stata code to facilitate adoption. We discuss issues surrounding the choice of δ in δ ‐based sensitivity analysis. We also review the debate on variance estimation within reference‐based analysis and justify the use of Rubin's variance estimator in this setting, since as we further elaborate on within, it provides information anchored inference