21,522 research outputs found

    LViT: Language meets Vision Transformer in Medical Image Segmentation

    Get PDF
    Deep learning has been widely used in medical image segmentation and other aspects. However, the performance of existing medical image segmentation models has been limited by the challenge of obtaining sufficient high-quality labeled data due to the prohibitive data annotation cost. To alleviate this limitation, we propose a new text-augmented medical image segmentation model LViT (Language meets Vision Transformer). In our LViT model, medical text annotation is incorporated to compensate for the quality deficiency in image data. In addition, the text information can guide to generate pseudo labels of improved quality in the semi-supervised learning. We also propose an Exponential Pseudo label Iteration mechanism (EPI) to help the Pixel-Level Attention Module (PLAM) preserve local image features in semi-supervised LViT setting. In our model, LV (Language-Vision) loss is designed to supervise the training of unlabeled images using text information directly. For evaluation, we construct three multimodal medical segmentation datasets (image + text) containing X-rays and CT images. Experimental results show that our proposed LViT has superior segmentation performance in both fully-supervised and semi-supervised setting. The code and datasets are available at https://github.com/HUANGLIZI/LViT

    LViT: Language meets Vision Transformer in Medical Image Segmentation

    Full text link
    Deep learning has been widely used in medical image segmentation and other aspects. However, the performance of existing medical image segmentation models has been limited by the challenge of obtaining sufficient high-quality labeled data due to the prohibitive data annotation cost. To alleviate this limitation, we propose a new text-augmented medical image segmentation model LViT (Language meets Vision Transformer). In our LViT model, medical text annotation is incorporated to compensate for the quality deficiency in image data. In addition, the text information can guide to generate pseudo labels of improved quality in the semi-supervised learning. We also propose an Exponential Pseudo label Iteration mechanism (EPI) to help the Pixel-Level Attention Module (PLAM) preserve local image features in semi-supervised LViT setting. In our model, LV (Language-Vision) loss is designed to supervise the training of unlabeled images using text information directly. For evaluation, we construct three multimodal medical segmentation datasets (image + text) containing X-rays and CT images. Experimental results show that our proposed LViT has superior segmentation performance in both fully-supervised and semi-supervised setting. The code and datasets are available at https://github.com/HUANGLIZI/LViT.Comment: Accepted by IEEE Transactions on Medical Imaging (TMI

    Generation of annotated multimodal ground truth datasets for abdominal medical image registration

    Full text link
    Sparsity of annotated data is a major limitation in medical image processing tasks such as registration. Registered multimodal image data are essential for the diagnosis of medical conditions and the success of interventional medical procedures. To overcome the shortage of data, we present a method that allows the generation of annotated multimodal 4D datasets. We use a CycleGAN network architecture to generate multimodal synthetic data from the 4D extended cardiac-torso (XCAT) phantom and real patient data. Organ masks are provided by the XCAT phantom, therefore the generated dataset can serve as ground truth for image segmentation and registration. Realistic simulation of respiration and heartbeat is possible within the XCAT framework. To underline the usability as a registration ground truth, a proof of principle registration is performed. Compared to real patient data, the synthetic data showed good agreement regarding the image voxel intensity distribution and the noise characteristics. The generated T1-weighted magnetic resonance imaging (MRI), computed tomography (CT), and cone beam CT (CBCT) images are inherently co-registered. Thus, the synthetic dataset allowed us to optimize registration parameters of a multimodal non-rigid registration, utilizing liver organ masks for evaluation. Our proposed framework provides not only annotated but also multimodal synthetic data which can serve as a ground truth for various tasks in medical imaging processing. We demonstrated the applicability of synthetic data for the development of multimodal medical image registration algorithms.Comment: 12 pages, 5 figures. This work has been published in the International Journal of Computer Assisted Radiology and Surgery volum

    Brain Tumor Synthetic Segmentation in 3D Multimodal MRI Scans

    Full text link
    The magnetic resonance (MR) analysis of brain tumors is widely used for diagnosis and examination of tumor subregions. The overlapping area among the intensity distribution of healthy, enhancing, non-enhancing, and edema regions makes the automatic segmentation a challenging task. Here, we show that a convolutional neural network trained on high-contrast images can transform the intensity distribution of brain lesions in its internal subregions. Specifically, a generative adversarial network (GAN) is extended to synthesize high-contrast images. A comparison of these synthetic images and real images of brain tumor tissue in MR scans showed significant segmentation improvement and decreased the number of real channels for segmentation. The synthetic images are used as a substitute for real channels and can bypass real modalities in the multimodal brain tumor segmentation framework. Segmentation results on BraTS 2019 dataset demonstrate that our proposed approach can efficiently segment the tumor areas. In the end, we predict patient survival time based on volumetric features of the tumor subregions as well as the age of each case through several regression models

    Scalable multimodal convolutional networks for brain tumour segmentation

    Get PDF
    Brain tumour segmentation plays a key role in computer-assisted surgery. Deep neural networks have increased the accuracy of automatic segmentation significantly, however these models tend to generalise poorly to different imaging modalities than those for which they have been designed, thereby limiting their applications. For example, a network architecture initially designed for brain parcellation of monomodal T1 MRI can not be easily translated into an efficient tumour segmentation network that jointly utilises T1, T1c, Flair and T2 MRI. To tackle this, we propose a novel scalable multimodal deep learning architecture using new nested structures that explicitly leverage deep features within or across modalities. This aims at making the early layers of the architecture structured and sparse so that the final architecture becomes scalable to the number of modalities. We evaluate the scalable architecture for brain tumour segmentation and give evidence of its regularisation effect compared to the conventional concatenation approach.Comment: Paper accepted at MICCAI 201
    corecore