19 research outputs found
Self-supervised contrastive learning of echocardiogram videos enables label-efficient cardiac disease diagnosis
Advances in self-supervised learning (SSL) have shown that self-supervised
pretraining on medical imaging data can provide a strong initialization for
downstream supervised classification and segmentation. Given the difficulty of
obtaining expert labels for medical image recognition tasks, such an
"in-domain" SSL initialization is often desirable due to its improved label
efficiency over standard transfer learning. However, most efforts toward SSL of
medical imaging data are not adapted to video-based medical imaging modalities.
With this progress in mind, we developed a self-supervised contrastive learning
approach, EchoCLR, catered to echocardiogram videos with the goal of learning
strong representations for efficient fine-tuning on downstream cardiac disease
diagnosis. EchoCLR leverages (i) distinct videos of the same patient as
positive pairs for contrastive learning and (ii) a frame re-ordering pretext
task to enforce temporal coherence. When fine-tuned on small portions of
labeled data (as few as 51 exams), EchoCLR pretraining significantly improved
classification performance for left ventricular hypertrophy (LVH) and aortic
stenosis (AS) over other transfer learning and SSL approaches across internal
and external test sets. For example, when fine-tuning on 10% of available
training data (519 studies), an EchoCLR-pretrained model achieved 0.72 AUROC
(95% CI: [0.69, 0.75]) on LVH classification, compared to 0.61 AUROC (95% CI:
[0.57, 0.64]) with a standard transfer learning approach. Similarly, using 1%
of available training data (53 studies), EchoCLR pretraining achieved 0.82
AUROC (95% CI: [0.79, 0.84]) on severe AS classification, compared to 0.61
AUROC (95% CI: [0.58, 0.65]) with transfer learning. EchoCLR is unique in its
ability to learn representations of medical videos and demonstrates that SSL
can enable label-efficient disease classification from small, labeled datasets
Improving Fairness of Automated Chest X-ray Diagnosis by Contrastive Learning
Purpose: Limited studies exploring concrete methods or approaches to tackle
and enhance model fairness in the radiology domain. Our proposed AI model
utilizes supervised contrastive learning to minimize bias in CXR diagnosis.
Materials and Methods: In this retrospective study, we evaluated our proposed
method on two datasets: the Medical Imaging and Data Resource Center (MIDRC)
dataset with 77,887 CXR images from 27,796 patients collected as of April 20,
2023 for COVID-19 diagnosis, and the NIH Chest X-ray (NIH-CXR) dataset with
112,120 CXR images from 30,805 patients collected between 1992 and 2015. In the
NIH-CXR dataset, thoracic abnormalities include atelectasis, cardiomegaly,
effusion, infiltration, mass, nodule, pneumonia, pneumothorax, consolidation,
edema, emphysema, fibrosis, pleural thickening, or hernia. Our proposed method
utilizes supervised contrastive learning with carefully selected positive and
negative samples to generate fair image embeddings, which are fine-tuned for
subsequent tasks to reduce bias in chest X-ray (CXR) diagnosis. We evaluated
the methods using the marginal AUC difference ( mAUC).
Results: The proposed model showed a significant decrease in bias across all
subgroups when compared to the baseline models, as evidenced by a paired T-test
(p<0.0001). The mAUC obtained by our method were 0.0116 (95\% CI,
0.0110-0.0123), 0.2102 (95% CI, 0.2087-0.2118), and 0.1000 (95\% CI,
0.0988-0.1011) for sex, race, and age on MIDRC, and 0.0090 (95\% CI,
0.0082-0.0097) for sex and 0.0512 (95% CI, 0.0512-0.0532) for age on NIH-CXR,
respectively.
Conclusion: Employing supervised contrastive learning can mitigate bias in
CXR diagnosis, addressing concerns of fairness and reliability in deep
learning-based diagnostic methods.Comment: 23 pages, 5 figure
Long-Tailed Classification of Thorax Diseases on Chest X-Ray: A New Benchmark Study
Imaging exams, such as chest radiography, will yield a small set of common
findings and a much larger set of uncommon findings. While a trained
radiologist can learn the visual presentation of rare conditions by studying a
few representative examples, teaching a machine to learn from such a
"long-tailed" distribution is much more difficult, as standard methods would be
easily biased toward the most frequent classes. In this paper, we present a
comprehensive benchmark study of the long-tailed learning problem in the
specific domain of thorax diseases on chest X-rays. We focus on learning from
naturally distributed chest X-ray data, optimizing classification accuracy over
not only the common "head" classes, but also the rare yet critical "tail"
classes. To accomplish this, we introduce a challenging new long-tailed chest
X-ray benchmark to facilitate research on developing long-tailed learning
methods for medical image classification. The benchmark consists of two chest
X-ray datasets for 19- and 20-way thorax disease classification, containing
classes with as many as 53,000 and as few as 7 labeled training images. We
evaluate both standard and state-of-the-art long-tailed learning methods on
this new benchmark, analyzing which aspects of these methods are most
beneficial for long-tailed medical image classification and summarizing
insights for future algorithm design. The datasets, trained models, and code
are available at https://github.com/VITA-Group/LongTailCXR.Comment: DALI 2022 (MICCAI workshop
How Does Pruning Impact Long-Tailed Multi-Label Medical Image Classifiers?
Pruning has emerged as a powerful technique for compressing deep neural
networks, reducing memory usage and inference time without significantly
affecting overall performance. However, the nuanced ways in which pruning
impacts model behavior are not well understood, particularly for long-tailed,
multi-label datasets commonly found in clinical settings. This knowledge gap
could have dangerous implications when deploying a pruned model for diagnosis,
where unexpected model behavior could impact patient well-being. To fill this
gap, we perform the first analysis of pruning's effect on neural networks
trained to diagnose thorax diseases from chest X-rays (CXRs). On two large CXR
datasets, we examine which diseases are most affected by pruning and
characterize class "forgettability" based on disease frequency and
co-occurrence behavior. Further, we identify individual CXRs where uncompressed
and heavily pruned models disagree, known as pruning-identified exemplars
(PIEs), and conduct a human reader study to evaluate their unifying qualities.
We find that radiologists perceive PIEs as having more label noise, lower image
quality, and higher diagnosis difficulty. This work represents a first step
toward understanding the impact of pruning on model behavior in deep
long-tailed, multi-label medical image classification. All code, model weights,
and data access instructions can be found at
https://github.com/VITA-Group/PruneCXR.Comment: Early accepted to MICCAI 202
Towards long-tailed, multi-label disease classification from chest X-ray: Overview of the CXR-LT challenge
Many real-world image recognition problems, such as diagnostic medical
imaging exams, are "long-tailed" \unicode{x2013} there are a few common
findings followed by many more relatively rare conditions. In chest
radiography, diagnosis is both a long-tailed and multi-label problem, as
patients often present with multiple findings simultaneously. While researchers
have begun to study the problem of long-tailed learning in medical image
recognition, few have studied the interaction of label imbalance and label
co-occurrence posed by long-tailed, multi-label disease classification. To
engage with the research community on this emerging topic, we conducted an open
challenge, CXR-LT, on long-tailed, multi-label thorax disease classification
from chest X-rays (CXRs). We publicly release a large-scale benchmark dataset
of over 350,000 CXRs, each labeled with at least one of 26 clinical findings
following a long-tailed distribution. We synthesize common themes of
top-performing solutions, providing practical recommendations for long-tailed,
multi-label medical image classification. Finally, we use these insights to
propose a path forward involving vision-language foundation models for few- and
zero-shot disease classification
The Germ Cell Nuclear Proteins hnRNP G-T and RBMY Activate a Testis-Specific Exon
The human testis has almost as high a frequency of alternative splicing events as brain. While not as extensively studied as brain, a few candidate testis-specific splicing regulator proteins have been identified, including the nuclear RNA binding proteins RBMY and hnRNP G-T, which are germ cell-specific versions of the somatically expressed hnRNP G protein and are highly conserved in mammals. The splicing activator protein Tra2β is also highly expressed in the testis and physically interacts with these hnRNP G family proteins. In this study, we identified a novel testis-specific cassette exon TLE4-T within intron 6 of the human transducing-like enhancer of split 4 (TLE4) gene which makes a more transcriptionally repressive TLE4 protein isoform. TLE4-T splicing is normally repressed in somatic cells because of a weak 5′ splice site and surrounding splicing-repressive intronic regions. TLE4-T RNA pulls down Tra2β and hnRNP G proteins which activate its inclusion. The germ cell-specific RBMY and hnRNP G-T proteins were more efficient in stimulating TLE4-T incorporation than somatically expressed hnRNP G protein. Tra2b bound moderately to TLE4-T RNA, but more strongly to upstream sites to potently activate an alternative 3′ splice site normally weakly selected in the testis. Co-expression of Tra2β with either hnRNP G-T or RBMY re-established the normal testis physiological splicing pattern of this exon. Although they can directly bind pre-mRNA sequences around the TLE4-T exon, RBMY and hnRNP G-T function as efficient germ cell-specific splicing co-activators of TLE4-T. Our study indicates a delicate balance between the activity of positive and negative splicing regulators combinatorially controls physiological splicing inclusion of exon TLE4-T and leads to modulation of signalling pathways in the testis. In addition, we identified a high-affinity binding site for hnRNP G-T protein, showing it is also a sequence-specific RNA binding protein
Hierarchical structure of cascade of primary and secondary periodicities in Fourier power spectrum of alphoid higher order repeats
<p>Abstract</p> <p>Background</p> <p>Identification of approximate tandem repeats is an important task of broad significance and still remains a challenging problem of computational genomics. Often there is no single best approach to periodicity detection and a combination of different methods may improve the prediction accuracy. Discrete Fourier transform (DFT) has been extensively used to study primary periodicities in DNA sequences. Here we investigate the application of DFT method to identify and study alphoid higher order repeats.</p> <p>Results</p> <p>We used method based on DFT with mapping of symbolic into numerical sequence to identify and study alphoid higher order repeats (HOR). For HORs the power spectrum shows equidistant frequency pattern, with characteristic two-level hierarchical organization as signature of HOR. Our case study was the 16 mer HOR tandem in AC017075.8 from human chromosome 7. Very long array of equidistant peaks at multiple frequencies (more than a thousand higher harmonics) is based on fundamental frequency of 16 mer HOR. Pronounced subset of equidistant peaks is based on multiples of the fundamental HOR frequency (multiplication factor <it>n </it>for <it>n</it>mer) and higher harmonics. In general, <it>n</it>mer HOR-pattern contains equidistant secondary periodicity peaks, having a pronounced subset of equidistant primary periodicity peaks. This hierarchical pattern as signature for HOR detection is robust with respect to monomer insertions and deletions, random sequence insertions etc. For a monomeric alphoid sequence only primary periodicity peaks are present. The 1/<it>f</it><sup><it>β </it></sup>– noise and periodicity three pattern are missing from power spectra in alphoid regions, in accordance with expectations.</p> <p>Conclusion</p> <p>DFT provides a robust detection method for higher order periodicity. Easily recognizable HOR power spectrum is characterized by hierarchical two-level equidistant pattern: higher harmonics of the fundamental HOR-frequency (secondary periodicity) and a subset of pronounced peaks corresponding to constituent monomers (primary periodicity). The number of lower frequency peaks (secondary periodicity) below the frequency of the first primary periodicity peak reveals the size of <it>n</it>mer HOR, i.e., the number <it>n </it>of monomers contained in consensus HOR.</p
Biometric contrastive learning for data-efficient deep learning from electrocardiographic images
Objective: Artificial intelligence (AI) detects heart disease from images of electrocardiograms (ECGs). However, traditional supervised learning is limited by the need for large amounts of labeled data. We report the development of Biometric Contrastive Learning (BCL), a self-supervised pretraining approach for label-efficient deep learning on ECG images.
Materials and Methods: Using pairs of ECGs from 78 288 individuals from Yale (2000-2015), we trained a convolutional neural network to identify temporally separated ECG pairs that varied in layouts from the same patient. We fine-tuned BCL-pretrained models to detect atrial fibrillation (AF), gender, and LVEF < 40%, using ECGs from 2015 to 2021. We externally tested the models in cohorts from Germany and the United States. We compared BCL with ImageNet initialization and general-purpose self-supervised contrastive learning for images (simCLR).
Results: While with 100% labeled training data, BCL performed similarly to other approaches for detecting AF/Gender/LVEF < 40% with an AUROC of 0.98/0.90/0.90 in the held-out test sets, it consistently outperformed other methods with smaller proportions of labeled data, reaching equivalent performance at 50% of data. With 0.1% data, BCL achieved AUROC of 0.88/0.79/0.75, compared with 0.51/0.52/0.60 (ImageNet) and 0.61/0.53/0.49 (simCLR). In external validation, BCL outperformed other methods even at 100% labeled training data, with an AUROC of 0.88/0.88 for Gender and LVEF < 40% compared with 0.83/0.83 (ImageNet) and 0.84/0.83 (simCLR).
Discussion and Conclusion: A pretraining strategy that leverages biometric signatures of different ECGs from the same patient enhances the efficiency of developing AI models for ECG images. This represents a major advance in detecting disorders from ECG images with limited labeled data
High sensitivity methods for automated rib fracture detection in pediatric radiographs
Abstract Rib fractures are highly predictive of non-accidental trauma in children under 3 years old. Rib fracture detection in pediatric radiographs is challenging because fractures can be obliquely oriented to the imaging detector, obfuscated by other structures, incomplete, and non-displaced. Prior studies have shown up to two-thirds of rib fractures may be missed during initial interpretation. In this paper, we implemented methods for improving the sensitivity (i.e. recall) performance for detecting and localizing rib fractures in pediatric chest radiographs to help augment performance of radiology interpretation. These methods adapted two convolutional neural network (CNN) architectures, RetinaNet and YOLOv5, and our previously proposed decision scheme, “avalanche decision”, that dynamically reduces the acceptance threshold for proposed regions in each image. Additionally, we present contributions of using multiple image pre-processing and model ensembling techniques. Using a custom dataset of 1109 pediatric chest radiographs manually labeled by seven pediatric radiologists, we performed 10-fold cross-validation and reported detection performance using several metrics, including F2 score which summarizes precision and recall for high-sensitivity tasks. Our best performing model used three ensembled YOLOv5 models with varied input processing and an avalanche decision scheme, achieving an F2 score of 0.725 ± 0.012. Expert inter-reader performance yielded an F2 score of 0.732. Results demonstrate that our combination of sensitivity-driving methods provides object detector performance approaching the capabilities of expert human readers, suggesting that these methods may provide a viable approach to identify all rib fractures
Recommended from our members
Severe aortic stenosis detection by deep learning applied to echocardiography.
BACKGROUND AND AIMS: Early diagnosis of aortic stenosis (AS) is critical to prevent morbidity and mortality but requires skilled examination with Doppler imaging. This study reports the development and validation of a novel deep learning model that relies on two-dimensional (2D) parasternal long axis videos from transthoracic echocardiography without Doppler imaging to identify severe AS, suitable for point-of-care ultrasonography. METHODS AND RESULTS: In a training set of 5257 studies (17 570 videos) from 2016 to 2020 [Yale-New Haven Hospital (YNHH), Connecticut], an ensemble of three-dimensional convolutional neural networks was developed to detect severe AS, leveraging self-supervised contrastive pretraining for label-efficient model development. This deep learning model was validated in a temporally distinct set of 2040 consecutive studies from 2021 from YNHH as well as two geographically distinct cohorts of 4226 and 3072 studies, from California and other hospitals in New England, respectively. The deep learning model achieved an area under the receiver operating characteristic curve (AUROC) of 0.978 (95% CI: 0.966, 0.988) for detecting severe AS in the temporally distinct test set, maintaining its diagnostic performance in geographically distinct cohorts [0.952 AUROC (95% CI: 0.941, 0.963) in California and 0.942 AUROC (95% CI: 0.909, 0.966) in New England]. The model was interpretable with saliency maps identifying the aortic valve, mitral annulus, and left atrium as the predictive regions. Among non-severe AS cases, predicted probabilities were associated with worse quantitative metrics of AS suggesting an association with various stages of AS severity. CONCLUSION: This study developed and externally validated an automated approach for severe AS detection using single-view 2D echocardiography, with potential utility for point-of-care screening