15 research outputs found

    ssROC: Semi-Supervised ROC Analysis for Reliable and Streamlined Evaluation of Phenotyping Algorithms

    Full text link
    Objective:\textbf{Objective:} High-throughput phenotyping will accelerate the use of electronic health records (EHRs) for translational research. A critical roadblock is the extensive medical supervision required for phenotyping algorithm (PA) estimation and evaluation. To address this challenge, numerous weakly-supervised learning methods have been proposed to estimate PAs. However, there is a paucity of methods for reliably evaluating the predictive performance of PAs when a very small proportion of the data is labeled. To fill this gap, we introduce a semi-supervised approach (ssROC) for estimation of the receiver operating characteristic (ROC) parameters of PAs (e.g., sensitivity, specificity). Materials and Methods:\textbf{Materials and Methods:} ssROC uses a small labeled dataset to nonparametrically impute missing labels. The imputations are then used for ROC parameter estimation to yield more precise estimates of PA performance relative to classical supervised ROC analysis (supROC) using only labeled data. We evaluated ssROC through in-depth simulation studies and an extensive evaluation of eight PAs from Mass General Brigham. Results:\textbf{Results:} In both simulated and real data, ssROC produced ROC parameter estimates with significantly lower variance than supROC for a given amount of labeled data. For the eight PAs, our results illustrate that ssROC achieves similar precision to supROC, but with approximately 60% of the amount of labeled data on average. Discussion:\textbf{Discussion:} ssROC enables precise evaluation of PA performance to increase trust in observational health research without demanding large volumes of labeled data. ssROC is also easily implementable in open-source R\texttt{R} software. Conclusion:\textbf{Conclusion:} When used in conjunction with weakly-supervised PAs, ssROC facilitates the reliable and streamlined phenotyping necessary for EHR-based research

    Desiderata for the development of next-generation electronic health record phenotype libraries

    Get PDF
    Background High-quality phenotype definitions are desirable to enable the extraction of patient cohorts from large electronic health record repositories and are characterized by properties such as portability, reproducibility, and validity. Phenotype libraries, where definitions are stored, have the potential to contribute significantly to the quality of the definitions they host. In this work, we present a set of desiderata for the design of a next-generation phenotype library that is able to ensure the quality of hosted definitions by combining the functionality currently offered by disparate tooling. Methods A group of researchers examined work to date on phenotype models, implementation, and validation, as well as contemporary phenotype libraries developed as a part of their own phenomics communities. Existing phenotype frameworks were also examined. This work was translated and refined by all the authors into a set of best practices. Results We present 14 library desiderata that promote high-quality phenotype definitions, in the areas of modelling, logging, validation, and sharing and warehousing. Conclusions There are a number of choices to be made when constructing phenotype libraries. Our considerations distil the best practices in the field and include pointers towards their further development to support portable, reproducible, and clinically valid phenotype design. The provision of high-quality phenotype definitions enables electronic health record data to be more effectively used in medical domains

    Biomedical Informatics Applications for Precision Management of Neurodegenerative Diseases

    Get PDF
    Modern medicine is in the midst of a revolution driven by “big data,” rapidly advancing computing power, and broader integration of technology into healthcare. Highly detailed and individualized profiles of both health and disease states are now possible, including biomarkers, genomic profiles, cognitive and behavioral phenotypes, high-frequency assessments, and medical imaging. Although these data are incredibly complex, they can potentially be used to understand multi-determinant causal relationships, elucidate modifiable factors, and ultimately customize treatments based on individual parameters. Especially for neurodegenerative diseases, where an effective therapeutic agent has yet to be discovered, there remains a critical need for an interdisciplinary perspective on data and information management due to the number of unanswered questions. Biomedical informatics is a multidisciplinary field that falls at the intersection of information technology, computer and data science, engineering, and healthcare that will be instrumental for uncovering novel insights into neurodegenerative disease research, including both causal relationships and therapeutic targets and maximizing the utility of both clinical and research data. The present study aims to provide a brief overview of biomedical informatics and how clinical data applications such as clinical decision support tools can be developed to derive new knowledge from the wealth of available data to advance clinical care and scientific research of neurodegenerative diseases in the era of precision medicine

    Desiderata for the development of next-generation electronic health record phenotype libraries

    Get PDF
    BackgroundHigh-quality phenotype definitions are desirable to enable the extraction of patient cohorts from large electronic health record repositories and are characterized by properties such as portability, reproducibility, and validity. Phenotype libraries, where definitions are stored, have the potential to contribute significantly to the quality of the definitions they host. In this work, we present a set of desiderata for the design of a next-generation phenotype library that is able to ensure the quality of hosted definitions by combining the functionality currently offered by disparate tooling.MethodsA group of researchers examined work to date on phenotype models, implementation, and validation, as well as contemporary phenotype libraries developed as a part of their own phenomics communities. Existing phenotype frameworks were also examined. This work was translated and refined by all the authors into a set of best practices.ResultsWe present 14 library desiderata that promote high-quality phenotype definitions, in the areas of modelling, logging, validation, and sharing and warehousing.ConclusionsThere are a number of choices to be made when constructing phenotype libraries. Our considerations distil the best practices in the field and include pointers towards their further development to support portable, reproducible, and clinically valid phenotype design. The provision of high-quality phenotype definitions enables electronic health record data to be more effectively used in medical domains
    corecore