ResViT: Residual vision transformers for multi-modal medical image
  synthesis

Dalmaz, Onat; Yurt, Mahmut; Çukur, Tolga

ResViT: Residual vision transformers for multi-modal medical image synthesis

Authors: Onat Dalmaz
Mahmut Yurt
Tolga Çukur
Publication date: 11 October 2021
Publisher

Abstract

Multi-modal imaging is a key healthcare technology that is often underutilized due to costs associated with multiple separate scans. This limitation yields the need for synthesis of unacquired modalities from the subset of available modalities. In recent years, generative adversarial network (GAN) models with superior depiction of structural details have been established as state-of-the-art in numerous medical image synthesis tasks. GANs are characteristically based on convolutional neural network (CNN) backbones that perform local processing with compact filters. This inductive bias in turn compromises learning of contextual features. Here, we propose a novel generative adversarial approach for medical image synthesis, ResViT, to combine local precision of convolution operators with contextual sensitivity of vision transformers. ResViT employs a central bottleneck comprising novel aggregated residual transformer (ART) blocks that synergistically combine convolutional and transformer modules. Comprehensive demonstrations are performed for synthesizing missing sequences in multi-contrast MRI, and CT images from MRI. Our results indicate superiority of ResViT against competing methods in terms of qualitative observations and quantitative metrics

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2106.16031

Last time updated on 15/07/2021