816 research outputs found
CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison
Large, labeled datasets have driven deep learning methods to achieve
expert-level performance on a variety of medical imaging tasks. We present
CheXpert, a large dataset that contains 224,316 chest radiographs of 65,240
patients. We design a labeler to automatically detect the presence of 14
observations in radiology reports, capturing uncertainties inherent in
radiograph interpretation. We investigate different approaches to using the
uncertainty labels for training convolutional neural networks that output the
probability of these observations given the available frontal and lateral
radiographs. On a validation set of 200 chest radiographic studies which were
manually annotated by 3 board-certified radiologists, we find that different
uncertainty approaches are useful for different pathologies. We then evaluate
our best model on a test set composed of 500 chest radiographic studies
annotated by a consensus of 5 board-certified radiologists, and compare the
performance of our model to that of 3 additional radiologists in the detection
of 5 selected pathologies. On Cardiomegaly, Edema, and Pleural Effusion, the
model ROC and PR curves lie above all 3 radiologist operating points. We
release the dataset to the public as a standard benchmark to evaluate
performance of chest radiograph interpretation models.
The dataset is freely available at
https://stanfordmlgroup.github.io/competitions/chexpert .Comment: Published in AAAI 201
PadChest: A large chest x-ray image dataset with multi-label annotated reports
We present a labeled large-scale, high resolution chest x-ray dataset for the
automated exploration of medical images along with their associated reports.
This dataset includes more than 160,000 images obtained from 67,000 patients
that were interpreted and reported by radiologists at Hospital San Juan
Hospital (Spain) from 2009 to 2017, covering six different position views and
additional information on image acquisition and patient demography. The reports
were labeled with 174 different radiographic findings, 19 differential
diagnoses and 104 anatomic locations organized as a hierarchical taxonomy and
mapped onto standard Unified Medical Language System (UMLS) terminology. Of
these reports, 27% were manually annotated by trained physicians and the
remaining set was labeled using a supervised method based on a recurrent neural
network with attention mechanisms. The labels generated were then validated in
an independent test set achieving a 0.93 Micro-F1 score. To the best of our
knowledge, this is one of the largest public chest x-ray database suitable for
training supervised models concerning radiographs, and the first to contain
radiographic reports in Spanish. The PadChest dataset can be downloaded from
http://bimcv.cipf.es/bimcv-projects/padchest/
Collaborative Training of Medical Artificial Intelligence Models with non-uniform Labels
Artificial intelligence (AI) methods are revolutionizing medical image
analysis. However, robust AI models require large multi-site datasets for
training. While multiple stakeholders have provided publicly available
datasets, the ways in which these data are labeled differ widely. For example,
one dataset of chest radiographs might contain labels denoting the presence of
metastases in the lung, while another dataset of chest radiograph might focus
on the presence of pneumonia. With conventional approaches, these data cannot
be used together to train a single AI model. We propose a new framework that we
call flexible federated learning (FFL) for collaborative training on such data.
Using publicly available data of 695,000 chest radiographs from five
institutions - each with differing labels - we demonstrate that large and
heterogeneously labeled datasets can be used to train one big AI model with
this framework. We find that models trained with FFL are superior to models
that are trained on matching annotations only. This may pave the way for
training of truly large-scale AI models that make efficient use of all existing
data.Comment: 2 figures, 3 tables, 5 supplementary table
Can Deep Learning Reliably Recognize Abnormality Patterns on Chest X-rays? A Multi-Reader Study Examining One Month of AI Implementation in Everyday Radiology Clinical Practice
In this study, we developed a deep-learning-based automatic detection
algorithm (DLAD, Carebot AI CXR) to detect and localize seven specific
radiological findings (atelectasis (ATE), consolidation (CON), pleural effusion
(EFF), pulmonary lesion (LES), subcutaneous emphysema (SCE), cardiomegaly
(CMG), pneumothorax (PNO)) on chest X-rays (CXR). We collected 956 CXRs and
compared the performance of the DLAD with that of six individual radiologists
who assessed the images in a hospital setting. The proposed DLAD achieved high
sensitivity (ATE 1.000 (0.624-1.000), CON 0.864 (0.671-0.956), EFF 0.953
(0.887-0.983), LES 0.905 (0.715-0.978), SCE 1.000 (0.366-1.000), CMG 0.837
(0.711-0.917), PNO 0.875 (0.538-0.986)), even when compared to the radiologists
(LOWEST: ATE 0.000 (0.000-0.376), CON 0.182 (0.070-0.382), EFF 0.400
(0.302-0.506), LES 0.238 (0.103-0.448), SCE 0.000 (0.000-0.634), CMG 0.347
(0.228-0.486), PNO 0.375 (0.134-0.691), HIGHEST: ATE 1.000 (0.624-1.000), CON
0.864 (0.671-0.956), EFF 0.953 (0.887-0.983), LES 0.667 (0.456-0.830), SCE
1.000 (0.366-1.000), CMG 0.980 (0.896-0.999), PNO 0.875 (0.538-0.986)). The
findings of the study demonstrate that the suggested DLAD holds potential for
integration into everyday clinical practice as a decision support system,
effectively mitigating the false negative rate associated with junior and
intermediate radiologists
Rethinking annotation granularity for overcoming deep shortcut learning: A retrospective study on chest radiographs
Deep learning has demonstrated radiograph screening performances that are
comparable or superior to radiologists. However, recent studies show that deep
models for thoracic disease classification usually show degraded performance
when applied to external data. Such phenomena can be categorized into shortcut
learning, where the deep models learn unintended decision rules that can fit
the identically distributed training and test set but fail to generalize to
other distributions. A natural way to alleviate this defect is explicitly
indicating the lesions and focusing the model on learning the intended
features. In this paper, we conduct extensive retrospective experiments to
compare a popular thoracic disease classification model, CheXNet, and a
thoracic lesion detection model, CheXDet. We first showed that the two models
achieved similar image-level classification performance on the internal test
set with no significant differences under many scenarios. Meanwhile, we found
incorporating external training data even led to performance degradation for
CheXNet. Then, we compared the models' internal performance on the lesion
localization task and showed that CheXDet achieved significantly better
performance than CheXNet even when given 80% less training data. By further
visualizing the models' decision-making regions, we revealed that CheXNet
learned patterns other than the target lesions, demonstrating its shortcut
learning defect. Moreover, CheXDet achieved significantly better external
performance than CheXNet on both the image-level classification task and the
lesion localization task. Our findings suggest improving annotation granularity
for training deep learning systems as a promising way to elevate future deep
learning-based diagnosis systems for clinical usage.Comment: 22 pages of main text, 18 pages of supplementary table
- …