Search CORE

53 research outputs found

Evaluation of ChatGPT Family of Models for Biomedical Reasoning and Classification

Author: Aerts Hugo JWL
Bitterman Danielle S.
Chen Shan
Li Yingya
Lu Sheng
Savova Guergana K.
Van Hoang
Publication venue
Publication date: 05/04/2023
Field of study

Recent advances in large language models (LLMs) have shown impressive ability in biomedical question-answering, but have not been adequately investigated for more specific biomedical applications. This study investigates the performance of LLMs such as the ChatGPT family of models (GPT-3.5s, GPT-4) in biomedical tasks beyond question-answering. Because no patient data can be passed to the OpenAI API public interface, we evaluated model performance with over 10000 samples as proxies for two fundamental tasks in the clinical domain - classification and reasoning. The first task is classifying whether statements of clinical and policy recommendations in scientific literature constitute health advice. The second task is causal relation detection from the biomedical literature. We compared LLMs with simpler models, such as bag-of-words (BoW) with logistic regression, and fine-tuned BioBERT models. Despite the excitement around viral ChatGPT, we found that fine-tuning for two fundamental NLP tasks remained the best strategy. The simple BoW model performed on par with the most complex LLM prompting. Prompt engineering required significant investment.Comment: 28 pages, 2 tables and 4 figures. Submitting for revie

arXiv.org e-Print Archive

Natural language processing to automatically extract the presence and severity of esophagitis in notes of patients undergoing radiotherapy

Author: Aerts Hugo JWL
Bitterman Danielle S.
Chen Shan
Guevara Marco
Mak Raymond H.
Miller Timothy A.
Murray Arpi
Ramirez Nicolas
Savova Guergana K.
Warner Jeremy L.
Publication venue
Publication date: 23/03/2023
Field of study

Radiotherapy (RT) toxicities can impair survival and quality-of-life, yet remain under-studied. Real-world evidence holds potential to improve our understanding of toxicities, but toxicity information is often only in clinical notes. We developed natural language processing (NLP) models to identify the presence and severity of esophagitis from notes of patients treated with thoracic RT. We fine-tuned statistical and pre-trained BERT-based models for three esophagitis classification tasks: Task 1) presence of esophagitis, Task 2) severe esophagitis or not, and Task 3) no esophagitis vs. grade 1 vs. grade 2-3. Transferability was tested on 345 notes from patients with esophageal cancer undergoing RT. Fine-tuning PubmedBERT yielded the best performance. The best macro-F1 was 0.92, 0.82, and 0.74 for Task 1, 2, and 3, respectively. Selecting the most informative note sections during fine-tuning improved macro-F1 by over 2% for all tasks. Silver-labeled data improved the macro-F1 by over 3% across all tasks. For the esophageal cancer notes, the best macro-F1 was 0.73, 0.74, and 0.65 for Task 1, 2, and 3, respectively, without additional fine-tuning. To our knowledge, this is the first effort to automatically extract esophagitis toxicity severity according to CTCAE guidelines from clinic notes. The promising performance provides proof-of-concept for NLP-based automated detailed toxicity monitoring in expanded domains.Comment: 17 pages, 6 tables, 1figure, submiting to JCO-CCI for revie

arXiv.org e-Print Archive

Maastricht University Research Portal

Recommended from our members

Towards comprehensive syntactic and semantic annotations of the clinical narrative

Author: Albright Daniel
Choi Jinho D
Dligach Dmitriy
Fredriksen Anwen
Hwang Jena D
Lanfranchi Arrick
Martin James
Nielsen Rodney D
Palmer Martha
Savova Guergana K
Styler William F
Ward Wayne
Warner Colin
Publication venue: 'BMJ'
Publication date: 25/01/2013
Field of study

Objective: To create annotated clinical narratives with layers of syntactic and semantic labels to facilitate advances in clinical natural language processing (NLP). To develop NLP algorithms and open source components. Methods: Manual annotation of a clinical narrative corpus of 127 606 tokens following the Treebank schema for syntactic information, PropBank schema for predicate-argument structures, and the Unified Medical Language System (UMLS) schema for semantic information. NLP components were developed. Results: The final corpus consists of 13 091 sentences containing 1772 distinct predicate lemmas. Of the 766 newly created PropBank frames, 74 are verbs. There are 28 539 named entity (NE) annotations spread over 15 UMLS semantic groups, one UMLS semantic type, and the Person semantic category. The most frequent annotations belong to the UMLS semantic groups of Procedures (15.71%), Disorders (14.74%), Concepts and Ideas (15.10%), Anatomy (12.80%), Chemicals and Drugs (7.49%), and the UMLS semantic type of Sign or Symptom (12.46%). Inter-annotator agreement results: Treebank (0.926), PropBank (0.891–0.931), NE (0.697–0.750). The part-of-speech tagger, constituency parser, dependency parser, and semantic role labeler are built from the corpus and released open source. A significant limitation uncovered by this project is the need for the NLP community to develop a widely agreed-upon schema for the annotation of clinical concepts and their relations. Conclusions: This project takes a foundational step towards bringing the field of clinical NLP up to par with NLP in the general domain. The corpus creation and NLP components provide a resource for research and application development that would have been previously impossible

Harvard University - DASH

PubMed Central

eScholarship - University of California

Recommended from our members

Automatic Prediction of Rheumatoid Arthritis Disease Activity from the Electronic Medical Records

Author: Canhao Helena
Chen Pei Jun
Dligach Dmitriy
Karlson Elizabeth W.
Lin Chen
Miller Timothy A.
Perez Raul Natanael Guzman
Plenge Robert M.
Savova Guergana K.
Shadick Nancy A.
Shen Yuanyan
Weinblatt Michael E.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 16/08/2013
Field of study

Objective: We aimed to mine the data in the Electronic Medical Record to automatically discover patients' Rheumatoid Arthritis disease activity at discrete rheumatology clinic visits. We cast the problem as a document classification task where the feature space includes concepts from the clinical narrative and lab values as stored in the Electronic Medical Record. Materials and Methods The Training Set consisted of 2792 clinical notes and associated lab values. Test Set 1 included 1749 clinical notes and associated lab values. Test Set 2 included 344 clinical notes for which there were no associated lab values. The Apache clinical Text Analysis and Knowledge Extraction System was used to analyze the text and transform it into informative features to be combined with relevant lab values. Results: Experiments over a range of machine learning algorithms and features were conducted. The best performing combination was linear kernel Support Vector Machines with Unified Medical Language System Concept Unique Identifier features with feature selection and lab values. The Area Under the Receiver Operating Characteristic Curve (AUC) is 0.831 (σ = 0.0317), statistically significant as compared to two baselines (AUC = 0.758, σ = 0.0291). Algorithms demonstrated superior performance on cases clinically defined as extreme categories of disease activity (Remission and High) compared to those defined as intermediate categories (Moderate and Low) and included laboratory data on inflammatory markers. Conclusion: Automatic Rheumatoid Arthritis disease activity discovery from Electronic Medical Record data is a learnable task approximating human performance. As a result, this approach might have several research applications, such as the identification of patients for genome-wide pharmacogenetic studies that require large sample sizes with precise definitions of disease activity and response to therapies

Harvard University - DASH

FigShare

The Mayo/MITRE system for discovery of obesity and its comorbidities

Author: Ben Wellner
Cheryl Clark
Christopher Chute
David Harris
Guergana Savova
Jiaping Zheng
John Aberdeen
K Bretonnel Cohen
Lynette Hirschman
Marcia Lazo
Qian Hu
Sean Murphy
Publication venue
Publication date: 01/01/2008
Field of study

Abstrac

CiteSeerX

Methods to Develop an Electronic Medical Record Phenotype Algorithm to Compare the Risk of Coronary Artery Disease across 3 Chronic Disease Cohorts

Author: Agniel Denis
Ananthakrishnan Ashwin N.
Cagan Andrew
Cai Tianxi
Chen Pei
Churchill Susanne
Gainer Vivian S.
Goryachev Sergey
Karlson Elizabeth W.
Kohane Isaac
Kumar Vishesh
Lee Jaeyoung
Liao Katherine P.
Murphy Shawn N.
Plenge Robert M.
Savova Guergana K.
Shaw Stanley Y.
Szolovits Peter
Xia Zongqi
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/09/2014
Field of study

Background Typically, algorithms to classify phenotypes using electronic medical record (EMR) data were developed to perform well in a specific patient population. There is increasing interest in analyses which can allow study of a specific outcome across different diseases. Such a study in the EMR would require an algorithm that can be applied across different patient populations. Our objectives were: (1) to develop an algorithm that would enable the study of coronary artery disease (CAD) across diverse patient populations; (2) to study the impact of adding narrative data extracted using natural language processing (NLP) in the algorithm. Additionally, we demonstrate how to implement CAD algorithm to compare risk across 3 chronic diseases in a preliminary study. Methods and Results We studied 3 established EMR based patient cohorts: diabetes mellitus (DM, n = 65,099), inflammatory bowel disease (IBD, n = 10,974), and rheumatoid arthritis (RA, n = 4,453) from two large academic centers. We developed a CAD algorithm using NLP in addition to structured data (e.g. ICD9 codes) in the RA cohort and validated it in the DM and IBD cohorts. The CAD algorithm using NLP in addition to structured data achieved specificity >95% with a positive predictive value (PPV) 90% in the training (RA) and validation sets (IBD and DM). The addition of NLP data improved the sensitivity for all cohorts, classifying an additional 17% of CAD subjects in IBD and 10% in DM while maintaining PPV of 90%. The algorithm classified 16,488 DM (26.1%), 457 IBD (4.2%), and 245 RA (5.0%) with CAD. In a cross-sectional analysis, CAD risk was 63% lower in RA and 68% lower in IBD compared to DM (p<0.0001) after adjusting for traditional cardiovascular risk factors. Conclusions We developed and validated a CAD algorithm that performed well across diverse patient populations. The addition of NLP into the CAD algorithm improved the sensitivity of the algorithm, particularly in cohorts where the prevalence of CAD was low. Preliminary data suggest that CAD risk was significantly lower in RA and IBD compared to DM.National Institutes of Health (U.S.). Informatics for Integrating Biology and the Bedside Project (U54LM008748

DSpace@MIT

Crossref

Harvard University - DASH

Directory of Open Access Journals

PubMed Central

FigShare

Discerning Tumor Status from Unstructured MRI Reports—Completeness of Information in Existing Reports and Utility of Automated Natural Language Processing

Author: AB Miller
BI Reiner
BJ Thomas
Bradley J. Erickson
C Cortes
CL Sistrom
CP Langlotz
E Galanis
G Hripcsak
G Hripcsak
G Hripcsak
GB Melton
GK Savova
Guergana K. Savova
I McCowan
IA McCowan
JC Denny
Jiaping Zheng
JL Hobby
JS Elkins
KJ Dreyer
L Berlin
L Zhou
Lionel T. E. Cheng
NR Dunnick
P Therasse
PM Hickey
R Khorasani
RK Taira
S Pakhomov
SS Naik
Y Lin
Publication venue: Springer-Verlag
Publication date: 01/01/2009
Field of study

Information in electronic medical records is often in an unstructured free-text format. This format presents challenges for expedient data retrieval and may fail to convey important findings. Natural language processing (NLP) is an emerging technique for rapid and efficient clinical data retrieval. While proven in disease detection, the utility of NLP in discerning disease progression from free-text reports is untested. We aimed to (1) assess whether unstructured radiology reports contained sufficient information for tumor status classification; (2) develop an NLP-based data extraction tool to determine tumor status from unstructured reports; and (3) compare NLP and human tumor status classification outcomes. Consecutive follow-up brain tumor magnetic resonance imaging reports (2000–2007) from a tertiary center were manually annotated using consensus guidelines on tumor status. Reports were randomized to NLP training (70%) or testing (30%) groups. The NLP tool utilized a support vector machines model with statistical and rule-based outcomes. Most reports had sufficient information for tumor status classification, although 0.8% did not describe status despite reference to prior examinations. Tumor size was unreported in 68.7% of documents, while 50.3% lacked data on change magnitude when there was detectable progression or regression. Using retrospective human classification as the gold standard, NLP achieved 80.6% sensitivity and 91.6% specificity for tumor status determination (mean positive predictive value, 82.4%; negative predictive value, 92.0%). In conclusion, most reports contained sufficient information for tumor status determination, though variable features were used to describe status. NLP demonstrated good accuracy for tumor status classification and may have novel application for automated disease status classification from electronic databases

Crossref

Springer - Publisher Connector

PubMed Central

Modeling Disease Severity in Multiple Sclerosis Using Electronic Health Records

Author: Ananthakrishnan Ashwin N.
Bove Riley M.
Cagan Andrew
Cai Tianxi
Chen Pei
Cheng Suchun
Chibnik Lori B.
Chitnis Tanuja
Churchill Susanne
De Jager Philip L.
Gainer Vivian
Karlson Elizabeth W.
Kohane Isaac
Liao Katherine P.
Murphy Shawn N.
Plenge Robert M.
Savova Guergana K.
Secor Elizabeth
Shaw Stanley Y.
Szolovits Peter
Weiner Howard L.
Xia Zongqi
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Objective: To optimally leverage the scalability and unique features of the electronic health records (EHR) for research that would ultimately improve patient care, we need to accurately identify patients and extract clinically meaningful measures. Using multiple sclerosis (MS) as a proof of principle, we showcased how to leverage routinely collected EHR data to identify patients with a complex neurological disorder and derive an important surrogate measure of disease severity heretofore only available in research settings. Methods: In a cross-sectional observational study, 5,495 MS patients were identified from the EHR systems of two major referral hospitals using an algorithm that includes codified and narrative information extracted using natural language processing. In the subset of patients who receive neurological care at a MS Center where disease measures have been collected, we used routinely collected EHR data to extract two aggregate indicators of MS severity of clinical relevance multiple sclerosis severity score (MSSS) and brain parenchymal fraction (BPF, a measure of whole brain volume). Results: The EHR algorithm that identifies MS patients has an area under the curve of 0.958, 83% sensitivity, 92% positive predictive value, and 89% negative predictive value when a 95% specificity threshold is used. The correlation between EHR-derived and true MSSS has a mean R[superscript 2] = 0.38±0.05, and that between EHR-derived and true BPF has a mean R[superscript 2] = 0.22±0.08. To illustrate its clinical relevance, derived MSSS captures the expected difference in disease severity between relapsing-remitting and progressive MS patients after adjusting for sex, age of symptom onset and disease duration (p = 1.56×10[superscript −12]). Conclusion: Incorporation of sophisticated codified and narrative EHR data accurately identifies MS patients and provides estimation of a well-accepted indicator of MS severity that is widely used in research settings but not part of the routine medical records. Similar approaches could be applied to other complex neurological disorders.National Institute of General Medical Sciences (U.S.) (NIH U54-LM008748

DSpace@MIT

Crossref

Harvard University - DASH

Directory of Open Access Journals

PubMed Central

eScholarship - University of California