312 research outputs found

    Hierarchical models for semi-competing risks data with application to quality of end-of-life care for pancreatic cancer

    Full text link
    Readmission following discharge from an initial hospitalization is a key marker of quality of health care in the United States. For the most part, readmission has been used to study quality of care for patients with acute health conditions, such as pneumonia and heart failure, with analyses typically based on a logistic-Normal generalized linear mixed model. Applying this model to the study readmission among patients with increasingly prevalent advanced health conditions such as pancreatic cancer is problematic, however, because it ignores death as a competing risk. A more appropriate analysis is to imbed such studies within the semi-competing risks framework. To our knowledge, however, no comprehensive statistical methods have been developed for cluster-correlated semi-competing risks data. In this paper we propose a novel hierarchical modeling framework for the analysis of cluster-correlated semi-competing risks data. The framework permits parametric or non-parametric specifications for a range of model components, including baseline hazard functions and distributions for key random effects, giving analysts substantial flexibility as they consider their own analyses. Estimation and inference is performed within the Bayesian paradigm since it facilitates the straightforward characterization of (posterior) uncertainty for all model parameters including hospital-specific random effects. The proposed framework is used to study the risk of readmission among 5,298 Medicare beneficiaries diagnosed with pancreatic cancer at 112 hospitals in the six New England states between 2000-2009, specifically to investigate the role of patient-level risk factors and to characterize variation in risk across hospitals that is not explained by differences in patient case-mix

    The combination of ecological and case-control data

    Get PDF
    Ecological studies, in which data are available at the level of the group, rather than at the level of the individual, are susceptible to a range of biases due to their inability to characterize within-group variability in exposures and confounders. In order to overcome these biases, we propose a hybrid design in which ecological data are supplemented with a sample of individual-level case-control data. We develop the likelihood for this design and illustrate its benefits via simulation, both in bias reduction when compared to an ecological study, and in efficiency gains relative to a conventional case-control study. An interesting special case of the proposed design is the situation where ecological data are supplemented with case-only data. The design is illustrated using a dataset of county-specific lung cancer mortality rates in the state of Ohio from 1988

    Hierarchical Models for Combining Ecological and Case-control Data

    Get PDF
    The ecological study design suffers from a broad range of biases that result from the loss of information regarding the joint distribution of individual-level outcomes, exposures and confounders. The consequent non-identifiability of individual-level models cannot be overcome without additional information; we combine ecological data with a sample of individual-level case-control data. The focus of this paper is hierarchical models to account for between-group heterogeneity. Estimation and inference pose serious compu- tational challenges. We present a Bayesian implementation, based on a data augmentation scheme where the unobserved data are treated as auxiliary variables. The methods are illustrated with a dataset of county-specific infant mortality data from the state of North Carolina

    Simulation of Semicompeting Risk Survival Data and Estimation Based on Multistate Frailty Model

    Get PDF
    We develop a simulation procedure to simulate the semicompeting risk survival data. In addition, we introduce an EM algorithm and a B–spline based estimation procedure to evaluate and implement Xu et al. (2010)’s nonparametric likelihood es- timation approach. The simulation procedure provides a route to simulate samples from the likelihood introduced in Xu et al. (2010)’s. Further, the EM algorithm and the B–spline methods stabilize the estimation and gives accurate estimation results. We illustrate the simulation and the estimation procedure with simluation examples and real data analysis

    osDesign: An R Package for the Analysis, Evaluation, and Design of Two-Phase and Case-Control Studies

    Get PDF
    The two-phase design has recently received attention in the statistical literature as an extension to the traditional case-control study for settings where a predictor of interest is rare or subject to missclassification. Despite a thorough methodological treatment and the potential for substantial efficiency gains, the two-phase design has not been widely adopted. This may be due, in part, to a lack of general-purpose, readily-available software. The osDesign package for R provides a suite of functions for analyzing data from a two-phase and/or case-control design, as well as evaluating operating characteristics, including bias, efficiency and power. The evaluation is simulation-based, permitting flexible application of the package to a broad range of scientific settings. Using lung cancer mortality data from Ohio, the package is illustrated with a detailed case-study in which two statistical goals are considered: (i) the evaluation of small-sample operating characteristics for two-phase and case-control designs and (ii) the planning and design of a future two-phase study

    Estimating weighted quantile treatment effects with missing outcome data by double sampling

    Full text link
    Causal weighted quantile treatment effects (WQTE) are a useful compliment to standard causal contrasts that focus on the mean when interest lies at the tails of the counterfactual distribution. To-date, however, methods for estimation and inference regarding causal WQTEs have assumed complete data on all relevant factors. Missing or incomplete data, however, is a widespread challenge in practical settings, particularly when the data are not collected for research purposes such as electronic health records and disease registries. Furthermore, in such settings may be particularly susceptible to the outcome data being missing-not-at-random (MNAR). In this paper, we consider the use of double-sampling, through which the otherwise missing data is ascertained on a sub-sample of study units, as a strategy to mitigate bias due to MNAR data in the estimation of causal WQTEs. With the additional data in-hand, we present identifying conditions that do not require assumptions regarding missingness in the original data. We then propose a novel inverse-probability weighted estimator and derive its' asymptotic properties, both pointwise at specific quantiles and uniform across a range of quantiles in (0,1), when the propensity score and double-sampling probabilities are estimated. For practical inference, we develop a bootstrap method that can be used for both pointwise and uniform inference. A simulation study is conducted to examine the finite sample performance of the proposed estimators

    Double sampling and semiparametric methods for informatively missing data

    Full text link
    Missing data arise almost ubiquitously in applied settings, and can pose a substantial threat to the validity of statistical analyses. In the context of comparative effectiveness research, such as in large observational databases (e.g., those derived from electronic health records), outcomes may be missing not at random with respect to measured covariates. In this setting, we propose a double sampling method, in which outcomes are obtained via intensive follow-up on a subsample of subjects for whom data were initially missing. We describe assumptions under which the joint distribution of confounders, treatment, and outcome is identified under this design, and derive efficient estimators of the average treatment effect under a nonparametric model, as well as a model assuming outcomes were initially missing at random. We compare these in simulations to an approach that adaptively selects an estimator based on evidence of violation of the missing at random assumption. We also show that the proposed double sampling design can be extended to handle arbitrary coarsening mechanisms, and derive consistent, asymptotically normal, and nonparametric efficient estimators of any smooth full data functional of interest, and prove that these estimators often are multiply robust.Comment: 35 pages, 2 figure

    Group lasso priors for Bayesian accelerated failure time models with left-truncated and interval-censored data

    Full text link
    An important task in health research is to characterize time-to-event outcomes such as disease onset or mortality in terms of a potentially high-dimensional set of risk factors. For example, prospective cohort studies of Alzheimer's disease typically enroll older adults for observation over several decades to assess the long-term impact of genetic and other factors on cognitive decline and mortality. The accelerated failure time model is particularly well-suited to such studies, structuring covariate effects as `horizontal' changes to the survival quantiles that conceptually reflect shifts in the outcome distribution due to lifelong exposures. However, this modeling task is complicated by the enrollment of adults at differing ages, and intermittent followup visits leading to interval censored outcome information. Moreover, genetic and clinical risk factors are not only high-dimensional, but characterized by underlying grouping structure, such as by function or gene location. Such grouped high-dimensional covariates require shrinkage methods that directly acknowledge this structure to facilitate variable selection and estimation. In this paper, we address these considerations directly by proposing a Bayesian accelerated failure time model with a group-structured lasso penalty, designed for left-truncated and interval-censored time-to-event data. We develop a custom Markov chain Monte Carlo sampler for efficient estimation, and investigate the impact of various methods of penalty tuning and thresholding for variable selection. We present a simulation study examining the performance of this method relative to models with an ordinary lasso penalty, and apply the proposed method to identify groups of predictive genetic and clinical risk factors for Alzheimer's disease in the Religious Orders Study and Memory and Aging Project (ROSMAP) prospective cohort studies of AD and dementia
    • …
    corecore