290 research outputs found
Quicksilver: Fast Predictive Image Registration - a Deep Learning Approach
This paper introduces Quicksilver, a fast deformable image registration
method. Quicksilver registration for image-pairs works by patch-wise prediction
of a deformation model based directly on image appearance. A deep
encoder-decoder network is used as the prediction model. While the prediction
strategy is general, we focus on predictions for the Large Deformation
Diffeomorphic Metric Mapping (LDDMM) model. Specifically, we predict the
momentum-parameterization of LDDMM, which facilitates a patch-wise prediction
strategy while maintaining the theoretical properties of LDDMM, such as
guaranteed diffeomorphic mappings for sufficiently strong regularization. We
also provide a probabilistic version of our prediction network which can be
sampled during the testing time to calculate uncertainties in the predicted
deformations. Finally, we introduce a new correction network which greatly
increases the prediction accuracy of an already existing prediction network. We
show experimental results for uni-modal atlas-to-image as well as uni- / multi-
modal image-to-image registrations. These experiments demonstrate that our
method accurately predicts registrations obtained by numerical optimization, is
very fast, achieves state-of-the-art registration results on four standard
validation datasets, and can jointly learn an image similarity measure.
Quicksilver is freely available as an open-source software.Comment: Add new discussion
Fast Predictive Multimodal Image Registration
We introduce a deep encoder-decoder architecture for image deformation
prediction from multimodal images. Specifically, we design an image-patch-based
deep network that jointly (i) learns an image similarity measure and (ii) the
relationship between image patches and deformation parameters. While our method
can be applied to general image registration formulations, we focus on the
Large Deformation Diffeomorphic Metric Mapping (LDDMM) registration model. By
predicting the initial momentum of the shooting formulation of LDDMM, we
preserve its mathematical properties and drastically reduce the computation
time, compared to optimization-based approaches. Furthermore, we create a
Bayesian probabilistic version of the network that allows evaluation of
registration uncertainty via sampling of the network at test time. We evaluate
our method on a 3D brain MRI dataset using both T1- and T2-weighted images. Our
experiments show that our method generates accurate predictions and that
learning the similarity measure leads to more consistent registrations than
relying on generic multimodal image similarity measures, such as mutual
information. Our approach is an order of magnitude faster than
optimization-based LDDMM.Comment: Accepted as a conference paper for ISBI 201
TM-NET: Deep Generative Networks for Textured Meshes
We introduce TM-NET, a novel deep generative model for synthesizing textured
meshes in a part-aware manner. Once trained, the network can generate novel
textured meshes from scratch or predict textures for a given 3D mesh, without
image guidance. Plausible and diverse textures can be generated for the same
mesh part, while texture compatibility between parts in the same shape is
achieved via conditional generation. Specifically, our method produces texture
maps for individual shape parts, each as a deformable box, leading to a natural
UV map with minimal distortion. The network separately embeds part geometry
(via a PartVAE) and part texture (via a TextureVAE) into their respective
latent spaces, so as to facilitate learning texture probability distributions
conditioned on geometry. We introduce a conditional autoregressive model for
texture generation, which can be conditioned on both part geometry and textures
already generated for other parts to achieve texture compatibility. To produce
high-frequency texture details, our TextureVAE operates in a high-dimensional
latent space via dictionary-based vector quantization. We also exploit
transparencies in the texture as an effective means to model complex shape
structures including topological details. Extensive experiments demonstrate the
plausibility, quality, and diversity of the textures and geometries generated
by our network, while avoiding inconsistency issues that are common to novel
view synthesis methods
Video and Image Super-Resolution via Deep Learning with Attention Mechanism
Image demosaicing, image super-resolution and video super-resolution are three important tasks in color imaging pipeline. Demosaicing deals with the recovery of missing color information and generation of full-resolution color images from so-called Color filter Array (CFA) such as Bayer pattern. Image super-resolution aims at increasing the spatial resolution and enhance important structures (e.g., edges and textures) in super-resolved images. Both spatial and temporal dependency are important to the task of video super-resolution, which has received increasingly more attention in recent years. Traditional solutions to these three low-level vision tasks lack generalization capability especially for real-world data. Recently, deep learning methods have achieved great success in vision problems including image demosaicing and image/video super-resolution. Conceptually similar to adaptation in model-based approaches, attention has received increasing more usage in deep learning recently. As a tool to reallocate limited computational resources based on the importance of informative components, attention mechanism which includes channel attention, spatial attention, non-local attention, etc. has found successful applications in both highlevel and low-level vision tasks. However, to the best of our knowledge, 1) most approaches independently studied super-resolution and demosaicing; little is known about the potential benefit of formulating a joint demosaicing and super-resolution (JDSR) problem; 2) attention mechanism has not been studied for spectral channels of color images in the open literature; 3) current approaches for video super-resolution implement deformable convolution based frame alignment methods and naive spatial attention mechanism. How to exploit attention mechanism in spectral and temporal domains sets up the stage for the research in this dissertation. In this dissertation, we conduct a systematic study about those two issues and make the following contributions: 1) we propose a spatial color attention network (SCAN) designed to jointly exploit the spatial and spectral dependency within color images for single image super-resolution (SISR) problem. We present a spatial color attention module that calibrates important color information for individual color components from output feature maps of residual groups. Experimental results have shown that SCAN has achieved superior performance in terms of both subjective and objective qualities on the NTIRE2019 dataset; 2) we propose two competing end-to-end joint optimization solutions to the JDSR problem: Densely-Connected Squeeze-and-Excitation Residual Network (DSERN) vs. Residual-Dense Squeeze-and-Excitation Network (RDSEN). Experimental results have shown that an enhanced design RDSEN can significantly improve both subjective and objective performance over DSERN; 3) we propose a novel deep learning based framework, Deformable Kernel Spatial Attention Network (DKSAN) to super-resolve videos with a scale factor as large as 16 (the extreme SR situation). Thanks to newly designed Deformable Kernel Convolution Alignment (DKC Align) and Deformable Kernel Spatial Attention (DKSA) modules, DKSAN can get both better subjective and objective results when compared with the existing state-of-the-art approach enhanced deformable convolutional network (EDVR)
DRINet for medical image segmentation
Convolutional neural networks (CNNs) have revolutionized medical image analysis over the past few years. The UNet architecture is one of the most well-known CNN architectures for semantic segmentation and has achieved remarkable successes in many different medical image segmentation applications. The U-Net architecture consists of standard convolution layers, pooling layers, and upsampling layers. These convolution layers learn representative features of input images and construct segmentations based on the features. However, the features learned by standard convolution layers are not distinctive when the differences among different categories are subtle in terms of intensity, location, shape, and size. In this paper, we propose a novel CNN architecture, called Dense-Res-Inception Net (DRINet), which addresses this challenging problem. The proposed DRINet consists of three blocks, namely a convolutional block with dense connections, a deconvolutional block with residual Inception modules, and an unpooling block. Our proposed architecture outperforms the U-Net in three different challenging applications, namely multi-class segmentation of cerebrospinal fluid (CSF) on brain CT images, multi-organ segmentation on abdominal CT images, multi-class brain tumour segmentation on MR images
- …