26 research outputs found
MedFuse: Multi-modal fusion with clinical time-series data and chest X-ray images
Multi-modal fusion approaches aim to integrate information from different
data sources. Unlike natural datasets, such as in audio-visual applications,
where samples consist of "paired" modalities, data in healthcare is often
collected asynchronously. Hence, requiring the presence of all modalities for a
given sample is not realistic for clinical tasks and significantly limits the
size of the dataset during training. In this paper, we propose MedFuse, a
conceptually simple yet promising LSTM-based fusion module that can accommodate
uni-modal as well as multi-modal input. We evaluate the fusion method and
introduce new benchmark results for in-hospital mortality prediction and
phenotype classification, using clinical time-series data in the MIMIC-IV
dataset and corresponding chest X-ray images in MIMIC-CXR. Compared to more
complex multi-modal fusion strategies, MedFuse provides a performance
improvement by a large margin on the fully paired test set. It also remains
robust across the partially paired test set containing samples with missing
chest X-ray images. We release our code for reproducibility and to enable the
evaluation of competing models in the future
Breast density classification with deep convolutional neural networks
Breast density classification is an essential part of breast cancer
screening. Although a lot of prior work considered this problem as a task for
learning algorithms, to our knowledge, all of them used small and not
clinically realistic data both for training and evaluation of their models. In
this work, we explore the limits of this task with a data set coming from over
200,000 breast cancer screening exams. We use this data to train and evaluate a
strong convolutional neural network classifier. In a reader study, we find that
our model can perform this task comparably to a human expert
Weakly-supervised High-resolution Segmentation of Mammography Images for Breast Cancer Diagnosis
In the last few years, deep learning classifiers have shown promising results
in image-based medical diagnosis. However, interpreting the outputs of these
models remains a challenge. In cancer diagnosis, interpretability can be
achieved by localizing the region of the input image responsible for the
output, i.e. the location of a lesion. Alternatively, segmentation or detection
models can be trained with pixel-wise annotations indicating the locations of
malignant lesions. Unfortunately, acquiring such labels is labor-intensive and
requires medical expertise. To overcome this difficulty, weakly-supervised
localization can be utilized. These methods allow neural network classifiers to
output saliency maps highlighting the regions of the input most relevant to the
classification task (e.g. malignant lesions in mammograms) using only
image-level labels (e.g. whether the patient has cancer or not) during
training. When applied to high-resolution images, existing methods produce
low-resolution saliency maps. This is problematic in applications in which
suspicious lesions are small in relation to the image size. In this work, we
introduce a novel neural network architecture to perform weakly-supervised
segmentation of high-resolution images. The proposed model selects regions of
interest via coarse-level localization, and then performs fine-grained
segmentation of those regions. We apply this model to breast cancer diagnosis
with screening mammography, and validate it on a large clinically-realistic
dataset. Measured by Dice similarity score, our approach outperforms existing
methods by a large margin in terms of localization performance of benign and
malignant lesions, relatively improving the performance by 39.6% and 20.0%,
respectively. Code and the weights of some of the models are available at
https://github.com/nyukat/GLAMComment: The last two authors contributed equally. Accepted to Medical Imaging
with Deep Learning (MIDL) 202
Leveraging Transformers to Improve Breast Cancer Classification and Risk Assessment with Multi-modal and Longitudinal Data
Breast cancer screening, primarily conducted through mammography, is often
supplemented with ultrasound for women with dense breast tissue. However,
existing deep learning models analyze each modality independently, missing
opportunities to integrate information across imaging modalities and time. In
this study, we present Multi-modal Transformer (MMT), a neural network that
utilizes mammography and ultrasound synergistically, to identify patients who
currently have cancer and estimate the risk of future cancer for patients who
are currently cancer-free. MMT aggregates multi-modal data through
self-attention and tracks temporal tissue changes by comparing current exams to
prior imaging. Trained on 1.3 million exams, MMT achieves an AUROC of 0.943 in
detecting existing cancers, surpassing strong uni-modal baselines. For 5-year
risk prediction, MMT attains an AUROC of 0.826, outperforming prior
mammography-based risk models. Our research highlights the value of multi-modal
and longitudinal imaging in cancer diagnosis and risk stratification.Comment: ML4H 2023 Findings Trac