8 research outputs found
Beyond the Hype: Assessing the Performance, Trustworthiness, and Clinical Suitability of GPT3.5
The use of large language models (LLMs) in healthcare is gaining popularity,
but their practicality and safety in clinical settings have not been thoroughly
assessed. In high-stakes environments like medical settings, trust and safety
are critical issues for LLMs. To address these concerns, we present an approach
to evaluate the performance and trustworthiness of a GPT3.5 model for medical
image protocol assignment. We compare it with a fine-tuned BERT model and a
radiologist. In addition, we have a radiologist review the GPT3.5 output to
evaluate its decision-making process. Our evaluation dataset consists of 4,700
physician entries across 11 imaging protocol classes spanning the entire head.
Our findings suggest that the GPT3.5 performance falls behind BERT and a
radiologist. However, GPT3.5 outperforms BERT in its ability to explain its
decision, detect relevant word indicators, and model calibration. Furthermore,
by analyzing the explanations of GPT3.5 for misclassifications, we reveal
systematic errors that need to be resolved to enhance its safety and
suitability for clinical use
Autonomous sweat extraction and analysis applied to cystic fibrosis and glucose monitoring using a fully integrated wearable platform
Perspiration-based wearable biosensors facilitate continuous monitoring of individuals’ health states with real-time and molecular-level insight. The inherent inaccessibility of sweat in sedentary individuals in large volume (≥10 µL) for on-demand and in situ analysis has limited our ability to capitalize on this noninvasive and rich source of information. A wearable and miniaturized iontophoresis interface is an excellent solution to overcome this barrier. The iontophoresis process involves delivery of stimulating agonists to the sweat glands with the aid of an electrical current. The challenge remains in devising an iontophoresis interface that can extract sufficient amount of sweat for robust sensing, without electrode corrosion and burning/causing discomfort in subjects. Here, we overcame this challenge through realizing an electrochemically enhanced iontophoresis interface, integrated in a wearable sweat analysis platform. This interface can be programmed to induce sweat with various secretion profiles for real-time analysis, a capability which can be exploited to advance our knowledge of the sweat gland physiology and the secretion process. To demonstrate the clinical value of our platform, human subject studies were performed in the context of the cystic fibrosis diagnosis and preliminary investigation of the blood/sweat glucose correlation. With our platform, we detected the elevated sweat electrolyte content of cystic fibrosis patients compared with that of healthy control subjects. Furthermore, our results indicate that oral glucose consumption in the fasting state is followed by increased glucose levels in both sweat and blood. Our solution opens the possibility for a broad range of noninvasive diagnostic and general population health monitoring applications
Autonomous sweat extraction and analysis applied to cystic fibrosis and glucose monitoring using a fully integrated wearable platform
Perspiration-based wearable biosensors facilitate continuous monitoring of individuals’ health states with real-time and molecular-level insight. The inherent inaccessibility of sweat in sedentary individuals in large volume (≥10 µL) for on-demand and in situ analysis has limited our ability to capitalize on this noninvasive and rich source of information. A wearable and miniaturized iontophoresis interface is an excellent solution to overcome this barrier. The iontophoresis process involves delivery of stimulating agonists to the sweat glands with the aid of an electrical current. The challenge remains in devising an iontophoresis interface that can extract sufficient amount of sweat for robust sensing, without electrode corrosion and burning/causing discomfort in subjects. Here, we overcame this challenge through realizing an electrochemically enhanced iontophoresis interface, integrated in a wearable sweat analysis platform. This interface can be programmed to induce sweat with various secretion profiles for real-time analysis, a capability which can be exploited to advance our knowledge of the sweat gland physiology and the secretion process. To demonstrate the clinical value of our platform, human subject studies were performed in the context of the cystic fibrosis diagnosis and preliminary investigation of the blood/sweat glucose correlation. With our platform, we detected the elevated sweat electrolyte content of cystic fibrosis patients compared with that of healthy control subjects. Furthermore, our results indicate that oral glucose consumption in the fasting state is followed by increased glucose levels in both sweat and blood. Our solution opens the possibility for a broad range of noninvasive diagnostic and general population health monitoring applications
Exploring the performance and explainability of fine-tuned BERT models for neuroradiology protocol assignment
Abstract Background Deep learning has demonstrated significant advancements across various domains. However, its implementation in specialized areas, such as medical settings, remains approached with caution. In these high-stake environments, understanding the model's decision-making process is critical. This study assesses the performance of different pretrained Bidirectional Encoder Representations from Transformers (BERT) models and delves into understanding its decision-making within the context of medical image protocol assignment. Methods Four different pre-trained BERT models (BERT, BioBERT, ClinicalBERT, RoBERTa) were fine-tuned for the medical image protocol classification task. Word importance was measured by attributing the classification output to every word using a gradient-based method. Subsequently, a trained radiologist reviewed the resulting word importance scores to assess the model’s decision-making process relative to human reasoning. Results The BERT model came close to human performance on our test set. The BERT model successfully identified relevant words indicative of the target protocol. Analysis of important words in misclassifications revealed potential systematic errors in the model. Conclusions The BERT model shows promise in medical image protocol assignment by reaching near human level performance and identifying key words effectively. The detection of systematic errors paves the way for further refinements to enhance its safety and utility in clinical settings
Recommended from our members
Machine learning for endoleak detection after endovascular aortic repair.
Diagnosis of endoleak following endovascular aortic repair (EVAR) relies on manual review of multi-slice CT angiography (CTA) by physicians which is a tedious and time-consuming process that is susceptible to error. We evaluate the use of a deep neural network for the detection of endoleak on CTA for post-EVAR patients using a novel data efficient training approach. 50 CTAs and 20 CTAs with and without endoleak respectively were identified based on gold standard interpretation by a cardiovascular subspecialty radiologist. The Endoleak Augmentor, a custom designed augmentation method, provided robust training for the machine learning (ML) model. Predicted segmentation maps underwent post-processing to determine the presence of endoleak. The model was tested against 3 blinded general radiologists and 1 blinded subspecialist using a held-out subset (10 positive endoleak CTAs, 10 control CTAs). Model accuracy, precision and recall for endoleak diagnosis were 95%, 90% and 100% relative to reference subspecialist interpretation (AUC = 0.99). Accuracy, precision and recall was 70/70/70% for generalist1, 50/50/90% for generalist2, and 90/83/100% for generalist3. The blinded subspecialist had concordant interpretations for all test cases compared with the reference. In conclusion, our ML-based approach has similar performance for endoleak diagnosis relative to subspecialists and superior performance compared with generalists