21,522 research outputs found
LViT: Language meets Vision Transformer in Medical Image Segmentation
Deep learning has been widely used in medical image segmentation and other aspects. However, the performance of existing medical image segmentation models has been limited by the challenge of obtaining sufficient high-quality labeled data due to the prohibitive data annotation cost. To alleviate this limitation, we propose a new text-augmented medical image segmentation model LViT (Language meets Vision Transformer). In our LViT model, medical text annotation is incorporated to compensate for the quality deficiency in image data. In addition, the text information can guide to generate pseudo labels of improved quality in the semi-supervised learning. We also propose an Exponential Pseudo label Iteration mechanism (EPI) to help the Pixel-Level Attention Module (PLAM) preserve local image features in semi-supervised LViT setting. In our model, LV (Language-Vision) loss is designed to supervise the training of unlabeled images using text information directly. For evaluation, we construct three multimodal medical segmentation datasets (image + text) containing X-rays and CT images. Experimental results show that our proposed LViT has superior segmentation performance in both fully-supervised and semi-supervised setting. The code and datasets are available at https://github.com/HUANGLIZI/LViT
LViT: Language meets Vision Transformer in Medical Image Segmentation
Deep learning has been widely used in medical image segmentation and other
aspects. However, the performance of existing medical image segmentation models
has been limited by the challenge of obtaining sufficient high-quality labeled
data due to the prohibitive data annotation cost. To alleviate this limitation,
we propose a new text-augmented medical image segmentation model LViT (Language
meets Vision Transformer). In our LViT model, medical text annotation is
incorporated to compensate for the quality deficiency in image data. In
addition, the text information can guide to generate pseudo labels of improved
quality in the semi-supervised learning. We also propose an Exponential Pseudo
label Iteration mechanism (EPI) to help the Pixel-Level Attention Module (PLAM)
preserve local image features in semi-supervised LViT setting. In our model, LV
(Language-Vision) loss is designed to supervise the training of unlabeled
images using text information directly. For evaluation, we construct three
multimodal medical segmentation datasets (image + text) containing X-rays and
CT images. Experimental results show that our proposed LViT has superior
segmentation performance in both fully-supervised and semi-supervised setting.
The code and datasets are available at https://github.com/HUANGLIZI/LViT.Comment: Accepted by IEEE Transactions on Medical Imaging (TMI
Generation of annotated multimodal ground truth datasets for abdominal medical image registration
Sparsity of annotated data is a major limitation in medical image processing
tasks such as registration. Registered multimodal image data are essential for
the diagnosis of medical conditions and the success of interventional medical
procedures. To overcome the shortage of data, we present a method that allows
the generation of annotated multimodal 4D datasets. We use a CycleGAN network
architecture to generate multimodal synthetic data from the 4D extended
cardiac-torso (XCAT) phantom and real patient data. Organ masks are provided by
the XCAT phantom, therefore the generated dataset can serve as ground truth for
image segmentation and registration. Realistic simulation of respiration and
heartbeat is possible within the XCAT framework. To underline the usability as
a registration ground truth, a proof of principle registration is performed.
Compared to real patient data, the synthetic data showed good agreement
regarding the image voxel intensity distribution and the noise characteristics.
The generated T1-weighted magnetic resonance imaging (MRI), computed tomography
(CT), and cone beam CT (CBCT) images are inherently co-registered. Thus, the
synthetic dataset allowed us to optimize registration parameters of a
multimodal non-rigid registration, utilizing liver organ masks for evaluation.
Our proposed framework provides not only annotated but also multimodal
synthetic data which can serve as a ground truth for various tasks in medical
imaging processing. We demonstrated the applicability of synthetic data for the
development of multimodal medical image registration algorithms.Comment: 12 pages, 5 figures. This work has been published in the
International Journal of Computer Assisted Radiology and Surgery volum
Recommended from our members
Automated CT and MRI Liver Segmentation and Biometry Using a Generalized Convolutional Neural Network.
PurposeTo assess feasibility of training a convolutional neural network (CNN) to automate liver segmentation across different imaging modalities and techniques used in clinical practice and apply this to enable automation of liver biometry.MethodsWe trained a 2D U-Net CNN for liver segmentation in two stages using 330 abdominal MRI and CT exams acquired at our institution. First, we trained the neural network with non-contrast multi-echo spoiled-gradient-echo (SGPR)images with 300 MRI exams to provide multiple signal-weightings. Then, we used transfer learning to generalize the CNN with additional images from 30 contrast-enhanced MRI and CT exams.We assessed the performance of the CNN using a distinct multi-institutional data set curated from multiple sources (n = 498 subjects). Segmentation accuracy was evaluated by computing Dice scores. Utilizing these segmentations, we computed liver volume from CT and T1-weighted (T1w) MRI exams, and estimated hepatic proton- density-fat-fraction (PDFF) from multi-echo T2*w MRI exams. We compared quantitative volumetry and PDFF estimates between automated and manual segmentation using Pearson correlation and Bland-Altman statistics.ResultsDice scores were 0.94 ± 0.06 for CT (n = 230), 0.95 ± 0.03 (n = 100) for T1w MR, and 0.92 ± 0.05 for T2*w MR (n = 169). Liver volume measured by manual and automated segmentation agreed closely for CT (95% limit-of-agreement (LoA) = [-298 mL, 180 mL]) and T1w MR (LoA = [-358 mL, 180 mL]). Hepatic PDFF measured by the two segmentations also agreed closely (LoA = [-0.62%, 0.80%]).ConclusionsUtilizing a transfer-learning strategy, we have demonstrated the feasibility of a CNN to be generalized to perform liver segmentations across different imaging techniques and modalities. With further refinement and validation, CNNs may have broad applicability for multimodal liver volumetry and hepatic tissue characterization
Brain Tumor Synthetic Segmentation in 3D Multimodal MRI Scans
The magnetic resonance (MR) analysis of brain tumors is widely used for
diagnosis and examination of tumor subregions. The overlapping area among the
intensity distribution of healthy, enhancing, non-enhancing, and edema regions
makes the automatic segmentation a challenging task. Here, we show that a
convolutional neural network trained on high-contrast images can transform the
intensity distribution of brain lesions in its internal subregions.
Specifically, a generative adversarial network (GAN) is extended to synthesize
high-contrast images. A comparison of these synthetic images and real images of
brain tumor tissue in MR scans showed significant segmentation improvement and
decreased the number of real channels for segmentation. The synthetic images
are used as a substitute for real channels and can bypass real modalities in
the multimodal brain tumor segmentation framework. Segmentation results on
BraTS 2019 dataset demonstrate that our proposed approach can efficiently
segment the tumor areas. In the end, we predict patient survival time based on
volumetric features of the tumor subregions as well as the age of each case
through several regression models
Scalable multimodal convolutional networks for brain tumour segmentation
Brain tumour segmentation plays a key role in computer-assisted surgery. Deep
neural networks have increased the accuracy of automatic segmentation
significantly, however these models tend to generalise poorly to different
imaging modalities than those for which they have been designed, thereby
limiting their applications. For example, a network architecture initially
designed for brain parcellation of monomodal T1 MRI can not be easily
translated into an efficient tumour segmentation network that jointly utilises
T1, T1c, Flair and T2 MRI. To tackle this, we propose a novel scalable
multimodal deep learning architecture using new nested structures that
explicitly leverage deep features within or across modalities. This aims at
making the early layers of the architecture structured and sparse so that the
final architecture becomes scalable to the number of modalities. We evaluate
the scalable architecture for brain tumour segmentation and give evidence of
its regularisation effect compared to the conventional concatenation approach.Comment: Paper accepted at MICCAI 201
- …