312 research outputs found
Hierarchical models for semi-competing risks data with application to quality of end-of-life care for pancreatic cancer
Readmission following discharge from an initial hospitalization is a key
marker of quality of health care in the United States. For the most part,
readmission has been used to study quality of care for patients with acute
health conditions, such as pneumonia and heart failure, with analyses typically
based on a logistic-Normal generalized linear mixed model. Applying this model
to the study readmission among patients with increasingly prevalent advanced
health conditions such as pancreatic cancer is problematic, however, because it
ignores death as a competing risk. A more appropriate analysis is to imbed such
studies within the semi-competing risks framework. To our knowledge, however,
no comprehensive statistical methods have been developed for cluster-correlated
semi-competing risks data. In this paper we propose a novel hierarchical
modeling framework for the analysis of cluster-correlated semi-competing risks
data. The framework permits parametric or non-parametric specifications for a
range of model components, including baseline hazard functions and
distributions for key random effects, giving analysts substantial flexibility
as they consider their own analyses. Estimation and inference is performed
within the Bayesian paradigm since it facilitates the straightforward
characterization of (posterior) uncertainty for all model parameters including
hospital-specific random effects. The proposed framework is used to study the
risk of readmission among 5,298 Medicare beneficiaries diagnosed with
pancreatic cancer at 112 hospitals in the six New England states between
2000-2009, specifically to investigate the role of patient-level risk factors
and to characterize variation in risk across hospitals that is not explained by
differences in patient case-mix
The combination of ecological and case-control data
Ecological studies, in which data are available at the level of the group, rather than at the level of the individual, are susceptible to a range of biases due to their inability to characterize within-group variability in exposures and confounders. In order to overcome these biases, we propose a hybrid design in which ecological data are supplemented with a sample of individual-level case-control data. We develop the likelihood for this design and illustrate its benefits via simulation, both in bias reduction when compared to an ecological study, and in efficiency gains relative to a conventional case-control study. An interesting special case of the proposed design is the situation where ecological data are supplemented with case-only data. The design is illustrated using a dataset of county-specific lung cancer mortality rates in the state of Ohio from 1988
Hierarchical Models for Combining Ecological and Case-control Data
The ecological study design suffers from a broad range of biases that result from the loss of information regarding the joint distribution of individual-level outcomes, exposures and confounders. The consequent non-identifiability of individual-level models cannot be overcome without additional information; we combine ecological data with a sample of individual-level case-control data. The focus of this paper is hierarchical models to account for between-group heterogeneity. Estimation and inference pose serious compu- tational challenges. We present a Bayesian implementation, based on a data augmentation scheme where the unobserved data are treated as auxiliary variables. The methods are illustrated with a dataset of county-specific infant mortality data from the state of North Carolina
Simulation of Semicompeting Risk Survival Data and Estimation Based on Multistate Frailty Model
We develop a simulation procedure to simulate the semicompeting risk survival data. In addition, we introduce an EM algorithm and a B–spline based estimation procedure to evaluate and implement Xu et al. (2010)’s nonparametric likelihood es- timation approach. The simulation procedure provides a route to simulate samples from the likelihood introduced in Xu et al. (2010)’s. Further, the EM algorithm and the B–spline methods stabilize the estimation and gives accurate estimation results. We illustrate the simulation and the estimation procedure with simluation examples and real data analysis
osDesign: An R Package for the Analysis, Evaluation, and Design of Two-Phase and Case-Control Studies
The two-phase design has recently received attention in the statistical literature as an extension to the traditional case-control study for settings where a predictor of interest is rare or subject to missclassification. Despite a thorough methodological treatment and the potential for substantial efficiency gains, the two-phase design has not been widely adopted. This may be due, in part, to a lack of general-purpose, readily-available software. The osDesign package for R provides a suite of functions for analyzing data from a two-phase and/or case-control design, as well as evaluating operating characteristics, including bias, efficiency and power. The evaluation is simulation-based, permitting flexible application of the package to a broad range of scientific settings. Using lung cancer mortality data from Ohio, the package is illustrated with a detailed case-study in which two statistical goals are considered: (i) the evaluation of small-sample operating characteristics for two-phase and case-control designs and (ii) the planning and design of a future two-phase study
Estimating weighted quantile treatment effects with missing outcome data by double sampling
Causal weighted quantile treatment effects (WQTE) are a useful compliment to
standard causal contrasts that focus on the mean when interest lies at the
tails of the counterfactual distribution. To-date, however, methods for
estimation and inference regarding causal WQTEs have assumed complete data on
all relevant factors. Missing or incomplete data, however, is a widespread
challenge in practical settings, particularly when the data are not collected
for research purposes such as electronic health records and disease registries.
Furthermore, in such settings may be particularly susceptible to the outcome
data being missing-not-at-random (MNAR). In this paper, we consider the use of
double-sampling, through which the otherwise missing data is ascertained on a
sub-sample of study units, as a strategy to mitigate bias due to MNAR data in
the estimation of causal WQTEs. With the additional data in-hand, we present
identifying conditions that do not require assumptions regarding missingness in
the original data. We then propose a novel inverse-probability weighted
estimator and derive its' asymptotic properties, both pointwise at specific
quantiles and uniform across a range of quantiles in (0,1), when the propensity
score and double-sampling probabilities are estimated. For practical inference,
we develop a bootstrap method that can be used for both pointwise and uniform
inference. A simulation study is conducted to examine the finite sample
performance of the proposed estimators
Double sampling and semiparametric methods for informatively missing data
Missing data arise almost ubiquitously in applied settings, and can pose a
substantial threat to the validity of statistical analyses. In the context of
comparative effectiveness research, such as in large observational databases
(e.g., those derived from electronic health records), outcomes may be missing
not at random with respect to measured covariates. In this setting, we propose
a double sampling method, in which outcomes are obtained via intensive
follow-up on a subsample of subjects for whom data were initially missing. We
describe assumptions under which the joint distribution of confounders,
treatment, and outcome is identified under this design, and derive efficient
estimators of the average treatment effect under a nonparametric model, as well
as a model assuming outcomes were initially missing at random. We compare these
in simulations to an approach that adaptively selects an estimator based on
evidence of violation of the missing at random assumption. We also show that
the proposed double sampling design can be extended to handle arbitrary
coarsening mechanisms, and derive consistent, asymptotically normal, and
nonparametric efficient estimators of any smooth full data functional of
interest, and prove that these estimators often are multiply robust.Comment: 35 pages, 2 figure
Group lasso priors for Bayesian accelerated failure time models with left-truncated and interval-censored data
An important task in health research is to characterize time-to-event
outcomes such as disease onset or mortality in terms of a potentially
high-dimensional set of risk factors. For example, prospective cohort studies
of Alzheimer's disease typically enroll older adults for observation over
several decades to assess the long-term impact of genetic and other factors on
cognitive decline and mortality. The accelerated failure time model is
particularly well-suited to such studies, structuring covariate effects as
`horizontal' changes to the survival quantiles that conceptually reflect shifts
in the outcome distribution due to lifelong exposures. However, this modeling
task is complicated by the enrollment of adults at differing ages, and
intermittent followup visits leading to interval censored outcome information.
Moreover, genetic and clinical risk factors are not only high-dimensional, but
characterized by underlying grouping structure, such as by function or gene
location. Such grouped high-dimensional covariates require shrinkage methods
that directly acknowledge this structure to facilitate variable selection and
estimation. In this paper, we address these considerations directly by
proposing a Bayesian accelerated failure time model with a group-structured
lasso penalty, designed for left-truncated and interval-censored time-to-event
data. We develop a custom Markov chain Monte Carlo sampler for efficient
estimation, and investigate the impact of various methods of penalty tuning and
thresholding for variable selection. We present a simulation study examining
the performance of this method relative to models with an ordinary lasso
penalty, and apply the proposed method to identify groups of predictive genetic
and clinical risk factors for Alzheimer's disease in the Religious Orders Study
and Memory and Aging Project (ROSMAP) prospective cohort studies of AD and
dementia
- …