Search CORE

131 research outputs found

The combination of ecological and case-control data

Author: Haneuse Sebastien
Wakefield Jon
Publication venue: Collection of Biostatistics Research Archive
Publication date: 01/06/2006
Field of study

Ecological studies, in which data are available at the level of the group, rather than at the level of the individual, are susceptible to a range of biases due to their inability to characterize within-group variability in exposures and confounders. In order to overcome these biases, we propose a hybrid design in which ecological data are supplemented with a sample of individual-level case-control data. We develop the likelihood for this design and illustrate its benefits via simulation, both in bias reduction when compared to an ecological study, and in efficiency gains relative to a conventional case-control study. An interesting special case of the proposed design is the situation where ecological data are supplemented with case-only data. The design is illustrated using a dataset of county-specific lung cancer mortality rates in the state of Ohio from 1988

Crossref

Collection Of Biostatistics Research Archive

Hierarchical Models for Combining Ecological and Case-control Data

Author: Haneuse Sebastien
Wakefield Jon
Publication venue: Collection of Biostatistics Research Archive
Publication date: 22/05/2006
Field of study

The ecological study design suffers from a broad range of biases that result from the loss of information regarding the joint distribution of individual-level outcomes, exposures and confounders. The consequent non-identifiability of individual-level models cannot be overcome without additional information; we combine ecological data with a sample of individual-level case-control data. The focus of this paper is hierarchical models to account for between-group heterogeneity. Estimation and inference pose serious compu- tational challenges. We present a Bayesian implementation, based on a data augmentation scheme where the unobserved data are treated as auxiliary variables. The methods are illustrated with a dataset of county-specific infant mortality data from the state of North Carolina

Collection Of Biostatistics Research Archive

Simulation of Semicompeting Risk Survival Data and Estimation Based on Multistate Frailty Model

Author: Haneuse Sebastien
Jiang Fei
Publication venue: Collection of Biostatistics Research Archive
Publication date: 14/04/2015
Field of study

We develop a simulation procedure to simulate the semicompeting risk survival data. In addition, we introduce an EM algorithm and a B–spline based estimation procedure to evaluate and implement Xu et al. (2010)’s nonparametric likelihood es- timation approach. The simulation procedure provides a route to simulate samples from the likelihood introduced in Xu et al. (2010)’s. Further, the EM algorithm and the B–spline methods stabilize the estimation and gives accurate estimation results. We illustrate the simulation and the estimation procedure with simluation examples and real data analysis

CiteSeerX

Collection Of Biostatistics Research Archive

Hierarchical models for semi-competing risks data with application to quality of end-of-life care for pancreatic cancer

Author: Dominici Francesca
Haneuse Sebastien
Lee Kyu Ha
Schrag Deborah
Publication venue: 'Informa UK Limited'
Publication date: 05/08/2015
Field of study

Readmission following discharge from an initial hospitalization is a key marker of quality of health care in the United States. For the most part, readmission has been used to study quality of care for patients with acute health conditions, such as pneumonia and heart failure, with analyses typically based on a logistic-Normal generalized linear mixed model. Applying this model to the study readmission among patients with increasingly prevalent advanced health conditions such as pancreatic cancer is problematic, however, because it ignores death as a competing risk. A more appropriate analysis is to imbed such studies within the semi-competing risks framework. To our knowledge, however, no comprehensive statistical methods have been developed for cluster-correlated semi-competing risks data. In this paper we propose a novel hierarchical modeling framework for the analysis of cluster-correlated semi-competing risks data. The framework permits parametric or non-parametric specifications for a range of model components, including baseline hazard functions and distributions for key random effects, giving analysts substantial flexibility as they consider their own analyses. Estimation and inference is performed within the Bayesian paradigm since it facilitates the straightforward characterization of (posterior) uncertainty for all model parameters including hospital-specific random effects. The proposed framework is used to study the risk of readmission among 5,298 Medicare beneficiaries diagnosed with pancreatic cancer at 112 hospitals in the six New England states between 2000-2009, specifically to investigate the role of patient-level risk factors and to characterize variation in risk across hospitals that is not explained by differences in patient case-mix

arXiv.org e-Print Archive

FigShare

osDesign: An R Package for the Analysis, Evaluation, and Design of Two-Phase and Case-Control Studies

Author: Haneuse Sebastien
Lumley Thomas
Saegusa Takumi
Publication venue: 'Foundation for Open Access Statistic'
Publication date: 01/08/2011
Field of study

The two-phase design has recently received attention in the statistical literature as an extension to the traditional case-control study for settings where a predictor of interest is rare or subject to missclassification. Despite a thorough methodological treatment and the potential for substantial efficiency gains, the two-phase design has not been widely adopted. This may be due, in part, to a lack of general-purpose, readily-available software. The osDesign package for R provides a suite of functions for analyzing data from a two-phase and/or case-control design, as well as evaluating operating characteristics, including bias, efficiency and power. The evaluation is simulation-based, permitting flexible application of the package to a broad range of scientific settings. Using lung cancer mortality data from Ohio, the package is illustrated with a detailed case-study in which two statistical goals are considered: (i) the evaluation of small-sample operating characteristics for two-phase and case-control designs and (ii) the planning and design of a future two-phase study

Directory of Open Access Journals

PubMed Central

Journal of Statistical Software

Estimating weighted quantile treatment effects with missing outcome data by double sampling

Author: Haneuse Sebastien
Mukherjee Rajarshi
Sun Shuo
Publication venue
Publication date: 13/10/2023
Field of study

Causal weighted quantile treatment effects (WQTE) are a useful compliment to standard causal contrasts that focus on the mean when interest lies at the tails of the counterfactual distribution. To-date, however, methods for estimation and inference regarding causal WQTEs have assumed complete data on all relevant factors. Missing or incomplete data, however, is a widespread challenge in practical settings, particularly when the data are not collected for research purposes such as electronic health records and disease registries. Furthermore, in such settings may be particularly susceptible to the outcome data being missing-not-at-random (MNAR). In this paper, we consider the use of double-sampling, through which the otherwise missing data is ascertained on a sub-sample of study units, as a strategy to mitigate bias due to MNAR data in the estimation of causal WQTEs. With the additional data in-hand, we present identifying conditions that do not require assumptions regarding missingness in the original data. We then propose a novel inverse-probability weighted estimator and derive its' asymptotic properties, both pointwise at specific quantiles and uniform across a range of quantiles in (0,1), when the propensity score and double-sampling probabilities are estimated. For practical inference, we develop a bootstrap method that can be used for both pointwise and uniform inference. A simulation study is conducted to examine the finite sample performance of the proposed estimators

arXiv.org e-Print Archive

Double sampling and semiparametric methods for informatively missing data

Author: Haneuse Sebastien
Levis Alexander W.
Mukherjee Rajarshi
Wang Rui
Publication venue
Publication date: 05/04/2022
Field of study

Missing data arise almost ubiquitously in applied settings, and can pose a substantial threat to the validity of statistical analyses. In the context of comparative effectiveness research, such as in large observational databases (e.g., those derived from electronic health records), outcomes may be missing not at random with respect to measured covariates. In this setting, we propose a double sampling method, in which outcomes are obtained via intensive follow-up on a subsample of subjects for whom data were initially missing. We describe assumptions under which the joint distribution of confounders, treatment, and outcome is identified under this design, and derive efficient estimators of the average treatment effect under a nonparametric model, as well as a model assuming outcomes were initially missing at random. We compare these in simulations to an approach that adaptively selects an estimator based on evidence of violation of the missing at random assumption. We also show that the proposed double sampling design can be extended to handle arbitrary coarsening mechanisms, and derive consistent, asymptotically normal, and nonparametric efficient estimators of any smooth full data functional of interest, and prove that these estimators often are multiply robust.Comment: 35 pages, 2 figure

arXiv.org e-Print Archive