5 research outputs found
ResViT: Residual vision transformers for multi-modal medical image synthesis
Multi-modal imaging is a key healthcare technology that is often
underutilized due to costs associated with multiple separate scans. This
limitation yields the need for synthesis of unacquired modalities from the
subset of available modalities. In recent years, generative adversarial network
(GAN) models with superior depiction of structural details have been
established as state-of-the-art in numerous medical image synthesis tasks. GANs
are characteristically based on convolutional neural network (CNN) backbones
that perform local processing with compact filters. This inductive bias in turn
compromises learning of contextual features. Here, we propose a novel
generative adversarial approach for medical image synthesis, ResViT, to combine
local precision of convolution operators with contextual sensitivity of vision
transformers. ResViT employs a central bottleneck comprising novel aggregated
residual transformer (ART) blocks that synergistically combine convolutional
and transformer modules. Comprehensive demonstrations are performed for
synthesizing missing sequences in multi-contrast MRI, and CT images from MRI.
Our results indicate superiority of ResViT against competing methods in terms
of qualitative observations and quantitative metrics
BolT: Fused Window Transformers for fMRI Time Series Analysis
Deep-learning models have enabled performance leaps in analysis of
high-dimensional functional MRI (fMRI) data. Yet, many previous methods are
suboptimally sensitive for contextual representations across diverse time
scales. Here, we present BolT, a blood-oxygen-level-dependent transformer
model, for analyzing multi-variate fMRI time series. BolT leverages a cascade
of transformer encoders equipped with a novel fused window attention mechanism.
Encoding is performed on temporally-overlapped windows within the time series
to capture local representations. To integrate information temporally,
cross-window attention is computed between base tokens in each window and
fringe tokens from neighboring windows. To gradually transition from local to
global representations, the extent of window overlap and thereby number of
fringe tokens are progressively increased across the cascade. Finally, a novel
cross-window regularization is employed to align high-level classification
features across the time series. Comprehensive experiments on large-scale
public datasets demonstrate the superior performance of BolT against
state-of-the-art methods. Furthermore, explanatory analyses to identify
landmark time points and regions that contribute most significantly to model
decisions corroborate prominent neuroscientific findings in the literature
Unsupervised Medical Image Translation with Adversarial Diffusion Models
Imputation of missing images via source-to-target modality translation can
improve diversity in medical imaging protocols. A pervasive approach for
synthesizing target images involves one-shot mapping through generative
adversarial networks (GAN). Yet, GAN models that implicitly characterize the
image distribution can suffer from limited sample fidelity. Here, we propose a
novel method based on adversarial diffusion modeling, SynDiff, for improved
performance in medical image translation. To capture a direct correlate of the
image distribution, SynDiff leverages a conditional diffusion process that
progressively maps noise and source images onto the target image. For fast and
accurate image sampling during inference, large diffusion steps are taken with
adversarial projections in the reverse diffusion direction. To enable training
on unpaired datasets, a cycle-consistent architecture is devised with coupled
diffusive and non-diffusive modules that bilaterally translate between two
modalities. Extensive assessments are reported on the utility of SynDiff
against competing GAN and diffusion models in multi-contrast MRI and MRI-CT
translation. Our demonstrations indicate that SynDiff offers quantitatively and
qualitatively superior performance against competing baselines.Comment: M. Ozbey and O. Dalmaz contributed equally to this stud
COVID-19 Detection from Respiratory Sounds with Hierarchical Spectrogram Transformers
Monitoring of prevalent airborne diseases such as COVID-19 characteristically
involves respiratory assessments. While auscultation is a mainstream method for
preliminary screening of disease symptoms, its utility is hampered by the need
for dedicated hospital visits. Remote monitoring based on recordings of
respiratory sounds on portable devices is a promising alternative, which can
assist in early assessment of COVID-19 that primarily affects the lower
respiratory tract. In this study, we introduce a novel deep learning approach
to distinguish patients with COVID-19 from healthy controls given audio
recordings of cough or breathing sounds. The proposed approach leverages a
novel hierarchical spectrogram transformer (HST) on spectrogram representations
of respiratory sounds. HST embodies self-attention mechanisms over local
windows in spectrograms, and window size is progressively grown over model
stages to capture local to global context. HST is compared against
state-of-the-art conventional and deep-learning baselines. Demonstrations on
crowd-sourced multi-national datasets indicate that HST outperforms competing
methods, achieving over 83% area under the receiver operating characteristic
curve (AUC) in detecting COVID-19 cases
Learning Fourier-Constrained Diffusion Bridges for MRI Reconstruction
Recent years have witnessed a surge in deep generative models for accelerated
MRI reconstruction. Diffusion priors in particular have gained traction with
their superior representational fidelity and diversity. Instead of the target
transformation from undersampled to fully-sampled data, common diffusion priors
are trained to learn a multi-step transformation from Gaussian noise onto
fully-sampled data. During inference, data-fidelity projections are injected in
between reverse diffusion steps to reach a compromise solution within the span
of both the diffusion prior and the imaging operator. Unfortunately, suboptimal
solutions can arise as the normality assumption of the diffusion prior causes
divergence between learned and target transformations. To address this
limitation, here we introduce the first diffusion bridge for accelerated MRI
reconstruction. The proposed Fourier-constrained diffusion bridge (FDB)
leverages a generalized process to transform between undersampled and
fully-sampled data via random noise addition and random frequency removal as
degradation operators. Unlike common diffusion priors that use an asymptotic
endpoint based on Gaussian noise, FDB captures a transformation between finite
endpoints where the initial endpoint is based on moderate degradation of
fully-sampled data. Demonstrations on brain MRI indicate that FDB outperforms
state-of-the-art reconstruction methods including conventional diffusion
priors