24 research outputs found
Enhancing Network Initialization for Medical AI Models Using Large-Scale, Unlabeled Natural Images
Pre-training datasets, like ImageNet, have become the gold standard in
medical image analysis. However, the emergence of self-supervised learning
(SSL), which leverages unlabeled data to learn robust features, presents an
opportunity to bypass the intensive labeling process. In this study, we
explored if SSL for pre-training on non-medical images can be applied to chest
radiographs and how it compares to supervised pre-training on non-medical
images and on medical images. We utilized a vision transformer and initialized
its weights based on (i) SSL pre-training on natural images (DINOv2), (ii) SL
pre-training on natural images (ImageNet dataset), and (iii) SL pre-training on
chest radiographs from the MIMIC-CXR database. We tested our approach on over
800,000 chest radiographs from six large global datasets, diagnosing more than
20 different imaging findings. Our SSL pre-training on curated images not only
outperformed ImageNet-based pre-training (P<0.001 for all datasets) but, in
certain cases, also exceeded SL on the MIMIC-CXR dataset. Our findings suggest
that selecting the right pre-training strategy, especially with SSL, can be
pivotal for improving artificial intelligence (AI)'s diagnostic accuracy in
medical imaging. By demonstrating the promise of SSL in chest radiograph
analysis, we underline a transformative shift towards more efficient and
accurate AI models in medical imaging
Empowering Clinicians and Democratizing Data Science: Large Language Models Automate Machine Learning for Clinical Studies
A knowledge gap persists between Machine Learning (ML) developers (e.g., data
scientists) and practitioners (e.g., clinicians), hampering the full
utilization of ML for clinical data analysis. We investigated the potential of
the chatGPT Advanced Data Analysis (ADA), an extension of GPT-4, to bridge this
gap and perform ML analyses efficiently. Real-world clinical datasets and study
details from large trials across various medical specialties were presented to
chatGPT ADA without specific guidance. ChatGPT ADA autonomously developed
state-of-the-art ML models based on the original study's training data to
predict clinical outcomes such as cancer development, cancer progression,
disease complications, or biomarkers such as pathogenic gene sequences.
Strikingly, these ML models matched or outperformed their published
counterparts. We conclude that chatGPT ADA offers a promising avenue to
democratize ML in medicine, making advanced analytics accessible to non-ML
experts and promoting broader applications in medical research and practice
Preserving privacy in domain transfer of medical AI models comes at no performance costs: The integral role of differential privacy
Developing robust and effective artificial intelligence (AI) models in
medicine requires access to large amounts of patient data. The use of AI models
solely trained on large multi-institutional datasets can help with this, yet
the imperative to ensure data privacy remains, particularly as membership
inference risks breaching patient confidentiality. As a proposed remedy, we
advocate for the integration of differential privacy (DP). We specifically
investigate the performance of models trained with DP as compared to models
trained without DP on data from institutions that the model had not seen during
its training (i.e., external validation) - the situation that is reflective of
the clinical use of AI models. By leveraging more than 590,000 chest
radiographs from five institutions, we evaluated the efficacy of DP-enhanced
domain transfer (DP-DT) in diagnosing cardiomegaly, pleural effusion,
pneumonia, atelectasis, and in identifying healthy subjects. We juxtaposed
DP-DT with non-DP-DT and examined diagnostic accuracy and demographic fairness
using the area under the receiver operating characteristic curve (AUC) as the
main metric, as well as accuracy, sensitivity, and specificity. Our results
show that DP-DT, even with exceptionally high privacy levels (epsilon around
1), performs comparably to non-DP-DT (P>0.119 across all domains). Furthermore,
DP-DT led to marginal AUC differences - less than 1% - for nearly all
subgroups, relative to non-DP-DT. Despite consistent evidence suggesting that
DP models induce significant performance degradation for on-domain
applications, we show that off-domain performance is almost not affected.
Therefore, we ardently advocate for the adoption of DP in training diagnostic
medical AI models, given its minimal impact on performance.Comment: Published in Radiology: Artificial Intelligence. RSN
Federated learning for secure development of AI models for Parkinson's disease detection using speech from different languages
Parkinson's disease (PD) is a neurological disorder impacting a person's
speech. Among automatic PD assessment methods, deep learning models have gained
particular interest. Recently, the community has explored cross-pathology and
cross-language models which can improve diagnostic accuracy even further.
However, strict patient data privacy regulations largely prevent institutions
from sharing patient speech data with each other. In this paper, we employ
federated learning (FL) for PD detection using speech signals from 3 real-world
language corpora of German, Spanish, and Czech, each from a separate
institution. Our results indicate that the FL model outperforms all the local
models in terms of diagnostic accuracy, while not performing very differently
from the model based on centrally combined training sets, with the advantage of
not requiring any data sharing among collaborators. This will simplify
inter-institutional collaborations, resulting in enhancement of patient
outcomes.Comment: Accepted for INTERSPEECH 202
Collaborative Training of Medical Artificial Intelligence Models with non-uniform Labels
Artificial intelligence (AI) methods are revolutionizing medical image
analysis. However, robust AI models require large multi-site datasets for
training. While multiple stakeholders have provided publicly available
datasets, the ways in which these data are labeled differ widely. For example,
one dataset of chest radiographs might contain labels denoting the presence of
metastases in the lung, while another dataset of chest radiograph might focus
on the presence of pneumonia. With conventional approaches, these data cannot
be used together to train a single AI model. We propose a new framework that we
call flexible federated learning (FFL) for collaborative training on such data.
Using publicly available data of 695,000 chest radiographs from five
institutions - each with differing labels - we demonstrate that large and
heterogeneously labeled datasets can be used to train one big AI model with
this framework. We find that models trained with FFL are superior to models
that are trained on matching annotations only. This may pave the way for
training of truly large-scale AI models that make efficient use of all existing
data.Comment: 2 figures, 3 tables, 5 supplementary table
Fibroglandular Tissue Segmentation in Breast MRI using Vision Transformers -- A multi-institutional evaluation
Accurate and automatic segmentation of fibroglandular tissue in breast MRI
screening is essential for the quantification of breast density and background
parenchymal enhancement. In this retrospective study, we developed and
evaluated a transformer-based neural network for breast segmentation (TraBS) in
multi-institutional MRI data, and compared its performance to the well
established convolutional neural network nnUNet. TraBS and nnUNet were trained
and tested on 200 internal and 40 external breast MRI examinations using manual
segmentations generated by experienced human readers. Segmentation performance
was assessed in terms of the Dice score and the average symmetric surface
distance. The Dice score for nnUNet was lower than for TraBS on the internal
testset (0.9090.069 versus 0.9160.067, P<0.001) and on the external
testset (0.8240.144 versus 0.8640.081, P=0.004). Moreover, the
average symmetric surface distance was higher (=worse) for nnUNet than for
TraBS on the internal (0.6572.856 versus 0.5482.195, P=0.001) and on
the external testset (0.7270.620 versus 0.5840.413, P=0.03). Our
study demonstrates that transformer-based networks improve the quality of
fibroglandular tissue segmentation in breast MRI compared to
convolutional-based models like nnUNet. These findings might help to enhance
the accuracy of breast density and parenchymal enhancement quantification in
breast MRI screening
DP CXR - Code for Communications Medicine Publication
<p>This is the software version of https://github.com/tayebiarasteh/DP_CXR which was used for obtaining results in Communications Medicine publication.</p><h3>In case you use this code, please cite the original paper:</h3><p>S. Tayebi Arasteh, A. Ziller, C. Kuhl, M. Makowski, S. Nebelung, R. Braren, D. Rueckert, D. Truhn, G. Kaissis. "<i>Private, fair and accurate: Training large-scale, privacy-preserving AI models in medical imaging</i>". ArXiv, arxiv.2302.01622, <a href="https://doi.org/10.48550/arxiv.2302.01622">https://doi.org/10.48550/arxiv.2302.01622</a>, 2023.</p>
Enhancing diagnostic deep learning via self-supervised pretraining on large-scale, unlabeled non-medical images
Abstract Background Pretraining labeled datasets, like ImageNet, have become a technical standard in advanced medical image analysis. However, the emergence of self-supervised learning (SSL), which leverages unlabeled data to learn robust features, presents an opportunity to bypass the intensive labeling process. In this study, we explored if SSL for pretraining on non-medical images can be applied to chest radiographs and how it compares to supervised pretraining on non-medical images and on medical images. Methods We utilized a vision transformer and initialized its weights based on the following: (i) SSL pretraining on non-medical images (DINOv2), (ii) supervised learning (SL) pretraining on non-medical images (ImageNet dataset), and (iii) SL pretraining on chest radiographs from the MIMIC-CXR database, the largest labeled public dataset of chest radiographs to date. We tested our approach on over 800,000 chest radiographs from 6 large global datasets, diagnosing more than 20 different imaging findings. Performance was quantified using the area under the receiver operating characteristic curve and evaluated for statistical significance using bootstrapping. Results SSL pretraining on non-medical images not only outperformed ImageNet-based pretraining (p < 0.001 for all datasets) but, in certain cases, also exceeded SL on the MIMIC-CXR dataset. Our findings suggest that selecting the right pretraining strategy, especially with SSL, can be pivotal for improving diagnostic accuracy of artificial intelligence in medical imaging. Conclusions By demonstrating the promise of SSL in chest radiograph analysis, we underline a transformative shift towards more efficient and accurate AI models in medical imaging. Relevance statement Self-supervised learning highlights a paradigm shift towards the enhancement of AI-driven accuracy and efficiency in medical imaging. Given its promise, the broader application of self-supervised learning in medical imaging calls for deeper exploration, particularly in contexts where comprehensive annotated datasets are limited. Graphical Abstrac