13 research outputs found
Leveraging GPT-4 for identifying cancer phenotypes in electronic health records: A performance comparison between GPT-4, GPT-3.5-turbo, Flan-T5, Llama-3-8B, and spaCy\u27s rule-based and machine learning-based methods
OBJECTIVE: Accurately identifying clinical phenotypes from Electronic Health Records (EHRs) provides additional insights into patients\u27 health, especially when such information is unavailable in structured data. This study evaluates the application of OpenAI\u27s Generative Pre-trained Transformer (GPT)-4 model to identify clinical phenotypes from EHR text in non-small cell lung cancer (NSCLC) patients. The goal was to identify disease stages, treatments and progression utilizing GPT-4, and compare its performance against GPT-3.5-turbo, Flan-T5-xl, Flan-T5-xxl, Llama-3-8B, and 2 rule-based and machine learning-based methods, namely, scispaCy and medspaCy.
MATERIALS AND METHODS: Phenotypes such as initial cancer stage, initial treatment, evidence of cancer recurrence, and affected organs during recurrence were identified from 13 646 clinical notes for 63 NSCLC patients from Washington University in St. Louis, Missouri. The performance of the GPT-4 model is evaluated against GPT-3.5-turbo, Flan-T5-xxl, Flan-T5-xl, Llama-3-8B, medspaCy, and scispaCy by comparing precision, recall, and micro-F1 scores.
RESULTS: GPT-4 achieved higher F1 score, precision, and recall compared to Flan-T5-xl, Flan-T5-xxl, Llama-3-8B, medspaCy, and scispaCy\u27s models. GPT-3.5-turbo performed similarly to that of GPT-4. GPT, Flan-T5, and Llama models were not constrained by explicit rule requirements for contextual pattern recognition. spaCy models relied on predefined patterns, leading to their suboptimal performance.
DISCUSSION AND CONCLUSION: GPT-4 improves clinical phenotype identification due to its robust pre-training and remarkable pattern recognition capability on the embedded tokens. It demonstrates data-driven effectiveness even with limited context in the input. While rule-based models remain useful for some tasks, GPT models offer improved contextual understanding of the text, and robust clinical phenotype extraction
13C surface characterization of midplane and crown collector probes on DIII-D
A dual collector probe system has been implemented on DIII-D for scrape-off-layer (SOL) impurity transport studies. These experiments injected isotopically enriched methane (13CD4) and sampled the impurities from this extrinsic, primary source with graphite collector probes at the outboard midplane and crown of upper single null L-mode plasmas. Using a stable isotopic mixing model, results suggest that 13C from methane injections prior to these experiments has built up on the walls of DIII-D to act as a secondary, intrinsic source of enriched 13C to the collector probes. This secondary source accounts for nearly 60 % of the deposits on the midplane collector probes and nearly 90 % of the deposition on the collector probes in the crown. These results lay the foundation for future impurity transport models and suggest that further simulation of impurity transport during the methane injection experiments will require two sources of enriched impurities in order to accurately model the SOL impurity profiles of 13C