58 research outputs found

    ssROC: Semi-Supervised ROC Analysis for Reliable and Streamlined Evaluation of Phenotyping Algorithms

    Full text link
    Objective:\textbf{Objective:} High-throughput phenotyping will accelerate the use of electronic health records (EHRs) for translational research. A critical roadblock is the extensive medical supervision required for phenotyping algorithm (PA) estimation and evaluation. To address this challenge, numerous weakly-supervised learning methods have been proposed to estimate PAs. However, there is a paucity of methods for reliably evaluating the predictive performance of PAs when a very small proportion of the data is labeled. To fill this gap, we introduce a semi-supervised approach (ssROC) for estimation of the receiver operating characteristic (ROC) parameters of PAs (e.g., sensitivity, specificity). Materials and Methods:\textbf{Materials and Methods:} ssROC uses a small labeled dataset to nonparametrically impute missing labels. The imputations are then used for ROC parameter estimation to yield more precise estimates of PA performance relative to classical supervised ROC analysis (supROC) using only labeled data. We evaluated ssROC through in-depth simulation studies and an extensive evaluation of eight PAs from Mass General Brigham. Results:\textbf{Results:} In both simulated and real data, ssROC produced ROC parameter estimates with significantly lower variance than supROC for a given amount of labeled data. For the eight PAs, our results illustrate that ssROC achieves similar precision to supROC, but with approximately 60% of the amount of labeled data on average. Discussion:\textbf{Discussion:} ssROC enables precise evaluation of PA performance to increase trust in observational health research without demanding large volumes of labeled data. ssROC is also easily implementable in open-source R\texttt{R} software. Conclusion:\textbf{Conclusion:} When used in conjunction with weakly-supervised PAs, ssROC facilitates the reliable and streamlined phenotyping necessary for EHR-based research

    Statistical Approaches To Reducing Bias And Improving Variance Estimation In The Presence Of Covariate And Outcome Measurement Error

    Get PDF
    Large epidemiologic studies with self-reported or routinely collected electronic health records (EHR) data are frequently being used as cost-effective ways to conduct clinical research, but these types of data are often prone to measurement error. While large epidemiologic studies play a crucial role in understanding the relationship between risk factors and health outcomes, such as disease incidence, these relationships cannot be properly understood unless methods are developed that reduce the bias caused by errors in both exposure variables and time-to-event outcome variables. Furthermore, variance estimates for outcome model regression parameters can be quite large in the presence of complex error-prone exposures and outcomes, yet strategies to improve variance estimation have been given little attention in the measurement error literature. Throughout this dissertation, we address these gaps in the literature by developing methodology that focuses on (1) reducing the bias that occurs from both error-prone exposures and outcomes in large epidemiologic cohort studies with periodic follow-up, (2) improving statistical efficiency by leveraging error-prone, auxiliary data alongside validated outcome data, and (3) considering alternative, better-behaved variance estimation strategies that may be used when techniques for adjusting for measurement error are applied. In Chapter 2, we present a method that combines an approach for addressing errors in event classification variables with regression calibration, a popular technique for addressing exposure error. This method reduces the bias induced by measurement errors in a discrete time-to-event setting. We apply our method to data from the Women’s Health Initiative (WHI) study to evaluate the association between dietary energy and protein and incident diabetes. Chapter 3 develops an approach for incorporating error-prone, auxiliary data into the analysis of an interval-censored time-to-event outcome. Here, the key goal is to improve statistical efficiency in the estimation of exposure-disease associations. We extend our methodology to handle data from a complex survey design and to be used in conjunction with regression calibration. Using this approach, we assess the association between energy and protein and the risk of diabetes in our motivating study, the Hispanic Community Health Study/Study of Latinos (HCHS/SOL). In Chapter 4, we propose a sandwich variance estimator as an approach for accounting for the uncertainty added by using an estimated exposure when regression calibration is applied to adjust for covariate error. This variance approach broadly applies to other two-stage regression settings. We outline a procedure for easily computing the sandwich in standard software and assess its properties through a numerical study and through illustrative data examples from the WHI and HCHS/SOL studies. Our results show that this method may have advantages over commonly applied, resampling-based variance estimation approaches

    NIOSH practices in occupational risk assessment

    Get PDF
    "Exposure to on-the-job health hazards is a problem faced by workers worldwide. Unlike safety hazards that may lead to injury, health hazards can lead to various types of illness. For example, exposures to some chemicals used in work processes may cause immediate sensory irritation (e.g., stinging or burning eyes, dry throat, cough); in other cases, workplace chemicals may cause cancer in workers many years after exposure. There are millions of U.S. workers exposed to chemicals in their work each year. In order to make recommendations for working safely in the presence of chemical hazards, the National Institute for Occupational Safety and Health (NIOSH) conducts risk assessments. In simple terms, risk assessment is a way of relating a hazard, like a toxic chemical in the air, to potential health risks associated with exposure to that hazard. Risk assessment allows NIOSH to make recommendations for controlling exposures in the workplace to reduce health risks. This document describes the process and logic NIOSH uses to conduct risk assessments, including the following steps: 1) Determining what type of hazard is associated with a chemical or other agent; 2) Collating the scientific evidence indicating whether the chemical or other agent causes illness or injury; 3) Evaluating the scientific data and determining how much exposure to the chemical or other agent would be harmful to workers; and 4) Carefully considering all relevant evidence to make the best, scientifically supported decisions. NIOSH researchers publish risk assessments in peer-reviewed scientific journals and in NIOSH-numbered documents. NIOSH-numbered publications also provide recommendations aimed to improve worker safety and health that stem from risk assessment." NIOSHTIC-2NIOSHTIC no. 20058767Suggested citation: NIOSH [2019]. Current intelligence bulletin 69: NIOSH practices in occupational risk assessment. By Daniels RD, Gilbert SJ, Kuppusamy SP, Kuempel ED, Park RM, Pandalai SP, Smith RJ, Wheeler MW, Whittaker C, Schulte PA. Cincinnati, OH: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, National Institute for Occupational Safety and Health. DHHS (NIOSH) Publication No. 2020-106, https://doi.org/10.26616/NIOSHPUB20201062020-106.pdf?id=10.26616/NIOSHPUB2020106202010.26616/NIOSHPUB2020106728

    Timely and reliable evaluation of the effects of interventions: a framework for adaptive meta-analysis (FAME)

    Get PDF
    Most systematic reviews are retrospective and use aggregate data AD) from publications, meaning they can be unreliable, lag behind therapeutic developments and fail to influence ongoing or new trials. Commonly, the potential influence of unpublished or ongoing trials is overlooked when interpreting results, or determining the value of updating the meta-analysis or need to collect individual participant data (IPD). Therefore, we developed a Framework for Adaptive Metaanalysis (FAME) to determine prospectively the earliest opportunity for reliable AD meta-analysis. We illustrate FAME using two systematic reviews in men with metastatic (M1) and non-metastatic (M0)hormone-sensitive prostate cancer (HSPC)
    • …
    corecore