Search CORE

15 research outputs found

ssROC: Semi-Supervised ROC Analysis for Reliable and Streamlined Evaluation of Phenotyping Algorithms

Author: Bonzel Clara-Lea
Gao Jianhui
Gronsbell Jessica
Hong Chuan
Varghese Paul
Zakir Karim
Publication venue
Publication date: 16/06/2023
Field of study

\textbf{Objective:}

High-throughput phenotyping will accelerate the use of electronic health records (EHRs) for translational research. A critical roadblock is the extensive medical supervision required for phenotyping algorithm (PA) estimation and evaluation. To address this challenge, numerous weakly-supervised learning methods have been proposed to estimate PAs. However, there is a paucity of methods for reliably evaluating the predictive performance of PAs when a very small proportion of the data is labeled. To fill this gap, we introduce a semi-supervised approach (ssROC) for estimation of the receiver operating characteristic (ROC) parameters of PAs (e.g., sensitivity, specificity).

\textbf{Materials and Methods:}

ssROC uses a small labeled dataset to nonparametrically impute missing labels. The imputations are then used for ROC parameter estimation to yield more precise estimates of PA performance relative to classical supervised ROC analysis (supROC) using only labeled data. We evaluated ssROC through in-depth simulation studies and an extensive evaluation of eight PAs from Mass General Brigham.

\textbf{Results:}

In both simulated and real data, ssROC produced ROC parameter estimates with significantly lower variance than supROC for a given amount of labeled data. For the eight PAs, our results illustrate that ssROC achieves similar precision to supROC, but with approximately 60% of the amount of labeled data on average.

\textbf{Discussion:}

ssROC enables precise evaluation of PA performance to increase trust in observational health research without demanding large volumes of labeled data. ssROC is also easily implementable in open-source

\texttt{R}

software.

\textbf{Conclusion:}

When used in conjunction with weakly-supervised PAs, ssROC facilitates the reliable and streamlined phenotyping necessary for EHR-based research

arXiv.org e-Print Archive

Desiderata for the development of next-generation electronic health record phenotype libraries

Author: Chapman M
Curcin V
Denaxas S
Gao C
Gkoutos GV
Jefferson E
Karwath A
Mumtaz S
Pacheco JA
Parkinson H
Rasmussen LV
Richesson RL
Thayer D
Publication venue
Publication date: 11/09/2021
Field of study

Background High-quality phenotype definitions are desirable to enable the extraction of patient cohorts from large electronic health record repositories and are characterized by properties such as portability, reproducibility, and validity. Phenotype libraries, where definitions are stored, have the potential to contribute significantly to the quality of the definitions they host. In this work, we present a set of desiderata for the design of a next-generation phenotype library that is able to ensure the quality of hosted definitions by combining the functionality currently offered by disparate tooling. Methods A group of researchers examined work to date on phenotype models, implementation, and validation, as well as contemporary phenotype libraries developed as a part of their own phenomics communities. Existing phenotype frameworks were also examined. This work was translated and refined by all the authors into a set of best practices. Results We present 14 library desiderata that promote high-quality phenotype definitions, in the areas of modelling, logging, validation, and sharing and warehousing. Conclusions There are a number of choices to be made when constructing phenotype libraries. Our considerations distil the best practices in the field and include pointers towards their further development to support portable, reproducible, and clinically valid phenotype design. The provision of high-quality phenotype definitions enables electronic health record data to be more effectively used in medical domains

UCL Discovery

Biomedical Informatics Applications for Precision Management of Neurodegenerative Diseases

Author: Jimenez-Maggoria Gustavo
Lombardo Joseph
Miller Justin B.
Shan Guogen
Publication venue: Digital Scholarship@UNLV
Publication date: 01/01/2018
Field of study

Modern medicine is in the midst of a revolution driven by “big data,” rapidly advancing computing power, and broader integration of technology into healthcare. Highly detailed and individualized profiles of both health and disease states are now possible, including biomarkers, genomic profiles, cognitive and behavioral phenotypes, high-frequency assessments, and medical imaging. Although these data are incredibly complex, they can potentially be used to understand multi-determinant causal relationships, elucidate modifiable factors, and ultimately customize treatments based on individual parameters. Especially for neurodegenerative diseases, where an effective therapeutic agent has yet to be discovered, there remains a critical need for an interdisciplinary perspective on data and information management due to the number of unanswered questions. Biomedical informatics is a multidisciplinary field that falls at the intersection of information technology, computer and data science, engineering, and healthcare that will be instrumental for uncovering novel insights into neurodegenerative disease research, including both causal relationships and therapeutic targets and maximizing the utility of both clinical and research data. The present study aims to provide a brief overview of biomedical informatics and how clinical data applications such as clinical decision support tools can be developed to derive new knowledge from the wealth of available data to advance clinical care and scientific research of neurodegenerative diseases in the era of precision medicine

University of Nevada, Las Vegas Repository

Desiderata for the development of next-generation electronic health record phenotype libraries

Author: Chapman Martin
Curcin Vasa
Denaxas Spiros
Gao Chuang
Gkoutos Georgios V.
Jefferson Emily
Karwath Andreas
Mumtaz Shahzad
Pacheco Jennifer A.
Parkinson Helen E.
Rasmussen Luke V.
Richesson Rachel L.
Thayer Dan
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2021
Field of study

BackgroundHigh-quality phenotype definitions are desirable to enable the extraction of patient cohorts from large electronic health record repositories and are characterized by properties such as portability, reproducibility, and validity. Phenotype libraries, where definitions are stored, have the potential to contribute significantly to the quality of the definitions they host. In this work, we present a set of desiderata for the design of a next-generation phenotype library that is able to ensure the quality of hosted definitions by combining the functionality currently offered by disparate tooling.MethodsA group of researchers examined work to date on phenotype models, implementation, and validation, as well as contemporary phenotype libraries developed as a part of their own phenomics communities. Existing phenotype frameworks were also examined. This work was translated and refined by all the authors into a set of best practices.ResultsWe present 14 library desiderata that promote high-quality phenotype definitions, in the areas of modelling, logging, validation, and sharing and warehousing.ConclusionsThere are a number of choices to be made when constructing phenotype libraries. Our considerations distil the best practices in the field and include pointers towards their further development to support portable, reproducible, and clinically valid phenotype design. The provision of high-quality phenotype definitions enables electronic health record data to be more effectively used in medical domains

Aberdeen University Research

University of Birmingham Research Portal

PubMed Central

UCL Discovery

Cronfa at Swansea University

University of Dundee Online Publications

Recommended from our members

Comparative analysis, applications, and interpretation of electronic health record-based stroke phenotyping methods

Author: Elkind Mitchell S. V.
Kummer Benjamin R.
Lorberbaum Tal
Tatonetti Nicholas P.
Thangaraj Phyllis M.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2020
Field of study

Background Accurate identification of acute ischemic stroke (AIS) patient cohorts is essential for a wide range of clinical investigations. Automated phenotyping methods that leverage electronic health records (EHRs) represent a fundamentally new approach cohort identification without current laborious and ungeneralizable generation of phenotyping algorithms. We systematically compared and evaluated the ability of machine learning algorithms and case-control combinations to phenotype acute ischemic stroke patients using data from an EHR. Materials and methods Using structured patient data from the EHR at a tertiary-care hospital system, we built and evaluated machine learning models to identify patients with AIS based on 75 different case-control and classifier combinations. We then estimated the prevalence of AIS patients across the EHR. Finally, we externally validated the ability of the models to detect AIS patients without AIS diagnosis codes using the UK Biobank. Results Across all models, we found that the mean AUROC for detecting AIS was 0.963 ± 0.0520 and average precision score 0.790 ± 0.196 with minimal feature processing. Classifiers trained with cases with AIS diagnosis codes and controls with no cerebrovascular disease codes had the best average F1 score (0.832 ± 0.0383). In the external validation, we found that the top probabilities from a model-predicted AIS cohort were significantly enriched for AIS patients without AIS diagnosis codes (60–150 fold over expected). Conclusions Our findings support machine learning algorithms as a generalizable way to accurately identify AIS patients without using process-intensive manual feature curation. When a set of AIS patients is unavailable, diagnosis codes may be used to train classifier models

Columbia University Academic Commons

The emerging landscape of health research based on biobanks linked to electronic health records: Existing resources, statistical challenges, and potential opportunities

Author: Aczon M
Agniel D
Al‐Azwani IK
Beaulieu‐Jones BK
Beaulieu‐Jones BK
Beesley LJ
Bjørnland T
Caballero K
Castro V
Choi SW
Fan JW
Fritsche LG
Garg R
Ge T
Good P
Haneuse S
Harang R
Johnson KW
Kuang Z
Lloyd‐Jones LR
Lloyd‐Jones LR
Long Q
Mcculloch CE
National Institutes of Health
Neale B.
Pendergrass SA
Pollard TJ
Rajkomar A
Rothman KJ
Santillana M
Shi X
Shickel B
Tang L
Thompson K
Uddin MJ
Wells BJ
West SG
Xie S
Yang J
Publication venue: 'Wiley'
Publication date: 15/03/2020
Field of study

Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/154448/1/sim8445_am.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/154448/2/sim8445.pd

Crossref

Deep Blue Documents at the University of Michigan