2,667 research outputs found
ResViT: Residual vision transformers for multi-modal medical image synthesis
Multi-modal imaging is a key healthcare technology that is often
underutilized due to costs associated with multiple separate scans. This
limitation yields the need for synthesis of unacquired modalities from the
subset of available modalities. In recent years, generative adversarial network
(GAN) models with superior depiction of structural details have been
established as state-of-the-art in numerous medical image synthesis tasks. GANs
are characteristically based on convolutional neural network (CNN) backbones
that perform local processing with compact filters. This inductive bias in turn
compromises learning of contextual features. Here, we propose a novel
generative adversarial approach for medical image synthesis, ResViT, to combine
local precision of convolution operators with contextual sensitivity of vision
transformers. ResViT employs a central bottleneck comprising novel aggregated
residual transformer (ART) blocks that synergistically combine convolutional
and transformer modules. Comprehensive demonstrations are performed for
synthesizing missing sequences in multi-contrast MRI, and CT images from MRI.
Our results indicate superiority of ResViT against competing methods in terms
of qualitative observations and quantitative metrics
Dual-Domain Multi-Contrast MRI Reconstruction with Synthesis-based Fusion Network
Purpose: To develop an efficient dual-domain reconstruction framework for
multi-contrast MRI, with the focus on minimising cross-contrast misalignment in
both the image and the frequency domains to enhance optimisation. Theory and
Methods: Our proposed framework, based on deep learning, facilitates the
optimisation for under-sampled target contrast using fully-sampled reference
contrast that is quicker to acquire. The method consists of three key steps: 1)
Learning to synthesise data resembling the target contrast from the reference
contrast; 2) Registering the multi-contrast data to reduce inter-scan motion;
and 3) Utilising the registered data for reconstructing the target contrast.
These steps involve learning in both domains with regularisation applied to
ensure their consistency. We also compare the reconstruction performance with
existing deep learning-based methods using a dataset of brain MRI scans.
Results: Extensive experiments demonstrate the superiority of our proposed
framework, for up to an 8-fold acceleration rate, compared to state-of-the-art
algorithms. Comprehensive analysis and ablation studies further present the
effectiveness of the proposed components. Conclusion:Our dual-domain framework
offers a promising approach to multi-contrast MRI reconstruction. It can also
be integrated with existing methods to further enhance the reconstruction
Learning to map between domains
Humans consume visual content avidly. We are in the midst of an imaging revolution enabled by inexpensive digital cameras and the internet. Almost everyone's cell phone has a camera. The photos taken by these cameras are shared massively and rapidly on the internet. However, there is an asymmetry: Each individual can consume only limited visual content in his limited lifetime, such that only a chosen few are talented enough to both express and understand something unseen visually and effectively. The rest of us try to understand and express something unseen by translating them to something seen before. Similarly, in the medical image field and radiological science, tens of thousands of medical images (MRI, CT, etc) of patients are taken. These medical images need to be studied and interpreted. In this dissertation, we investigate a number of data-driven approaches for mapping from an 'unseen' or hard to understand domain to a 'seen' or easy-to-understand domain. Our work includes mapping between two image domains and mapping from an image domain to a language domain, which in computer vision are called, respectively, image-to-image translation and image captioning. The presented methods not only help users to easily and accurately synthesize useful photos, but also enable new visual and linguistic effects not possible before this work. In the clinical diagnosis, these approaches can improve the accuracy and efficiency of the diagnosis process for the experienced radiologist. What's more, the approach of mapping from image domain to text domain can mimic the work of the experienced radiologist for automatic medical report generation.
Part I: This part describes image segmentation, which can be treated as a special case of image-to-image translation. This part includes two works. The first work solves the anisotropic resolution problem for 3D medical image semantic segmentation in the Appendix A. The second work describes our US patented cross-domain medical image segmentation. The first domain has labels while the second domain has no labels; by designing a special domain mapping, we enable image semantic segmentation on the second domain. Both of these works can improve computer aided medical image interpretation and help the radiologist read the medical images more efficiently and accurately.
Part II: In the clinical diagnosis, in order to combine the advantages of multiple medical imaging modalities together, medical image registrations or cross-domain image translation is needed. A crucial requirement for both is one to one correspondence. Because the medical images from multiple image modalities (such as MRI, CT) are from the same patients. This part presents learning a self-inverse network to realize one-to-one mapping for both paired and unpaired image-to-image translation.
Part III: In the clinical diagnosis, the final output of the diagnosis is in text domain(such as medical report, medical prescriptions etc). Since medical report writing based on medical image can be error-prone for inexperienced physicians, and time-consuming and tedious for experienced physicians, automatic generation of medical image report can make this tedious and difficult task efficient. This part expands to learn the mapping from the image domain to the language domain. Specifically, the mapping is done by learning a language representation to form the language domain
Learning Disentangled Representations in the Imaging Domain
Disentangled representation learning has been proposed as an approach to
learning general representations even in the absence of, or with limited,
supervision. A good general representation can be fine-tuned for new target
tasks using modest amounts of data, or used directly in unseen domains
achieving remarkable performance in the corresponding task. This alleviation of
the data and annotation requirements offers tantalising prospects for
applications in computer vision and healthcare. In this tutorial paper, we
motivate the need for disentangled representations, present key theory, and
detail practical building blocks and criteria for learning such
representations. We discuss applications in medical imaging and computer vision
emphasising choices made in exemplar key works. We conclude by presenting
remaining challenges and opportunities.Comment: Submitted. This paper follows a tutorial style but also surveys a
considerable (more than 200 citations) number of work
Cross-Modality Feature Learning for Three-Dimensional Brain Image Synthesis
Multi-modality medical imaging is increasingly used for comprehensive assessment of complex diseases in either diagnostic examinations or as part of medical research trials. Different imaging modalities provide complementary information about living tissues. However, multi-modal examinations are not always possible due to adversary factors such as patient discomfort, increased cost, prolonged scanning time and scanner unavailability. In addition, in large imaging studies, incomplete records are not uncommon owing to image artifacts, data corruption or data loss, which compromise the potential of multi-modal acquisitions. Moreover, independently of how well an imaging system is, the performance of the imaging equipment usually comes to a certain limit through different physical devices. Additional interferences arise (particularly for medical imaging systems), for example, limited acquisition times, sophisticated and costly equipment and patients with severe medical conditions, which also cause image degradation. The acquisitions can be considered as the degraded version of the original high-quality images.
In this dissertation, we explore the problems of image super-resolution and cross-modality synthesis for one Magnetic Resonance Imaging (MRI) modality from an image of another MRI modality of the same subject using an image synthesis framework for reconstructing the missing/complex modality data. We develop models and techniques that allow us to connect the domain of source modality data and the domain of target modality data, enabling transformation between elements of
the two domains. In particular, we first introduce the models that project both source modality data and target modality data into a common multi-modality feature space in a supervised setting. This common space then allows us to connect cross-modality features that depict a relationship between each other, and we can impose the learned association function that synthesizes any target modality image. Moreover, we develop a weakly-supervised method that takes a few registered multi-modality image pairs as training data and generates the desired modality data without being constrained a large number of multi-modality images collection of well-processed (\textit{e.g.}, skull-stripped and strictly registered) brain data. Finally, we propose an approach that provides a generic way of learning a dual mapping between source and target domains while considering both visually high-fidelity synthesis and task-practicability. We demonstrate that this model can be used to take any arbitrary modality and efficiently synthesize the desirable modality data in an unsupervised manner.
We show that these proposed models advance the state-of-the-art on image super-resolution and cross-modality synthesis tasks that need jointly processing of multi-modality images and that we can design the algorithms in ways to generate the practically beneficial data to medical image analysis
A Review on Low-Dose Emission Tomography Post-Reconstruction Denoising with Neural Network Approaches.
Low-dose emission tomography (ET) plays a crucial role in medical imaging, enabling the acquisition of functional information for various biological processes while minimizing the patient dose. However, the inherent randomness in the photon counting process is a source of noise which is amplified in low-dose ET. This review article provides an overview of existing post-processing techniques, with an emphasis on deep neural network (NN) approaches. Furthermore, we explore future directions in the field of NN-based low-dose ET. This comprehensive examination sheds light on the potential of deep learning in enhancing the quality and resolution of low-dose ET images, ultimately advancing the field of medical imaging
A Review on Low-Dose Emission Tomography Post-Reconstruction Denoising with Neural Network Approaches
Low-dose emission tomography (ET) plays a crucial role in medical imaging,
enabling the acquisition of functional information for various biological
processes while minimizing the patient dose. However, the inherent randomness
in the photon counting process is a source of noise which is amplified in
low-dose ET. This review article provides an overview of existing
post-processing techniques, with an emphasis on deep neural network (NN)
approaches. Furthermore, we explore future directions in the field of NN-based
low-dose ET. This comprehensive examination sheds light on the potential of
deep learning in enhancing the quality and resolution of low-dose ET images,
ultimately advancing the field of medical imaging.Comment: 16 pages, 6 figure
- …