1,423 research outputs found
Deep Generative Filter for Motion Deblurring
Removing blur caused by camera shake in images has always been a challenging
problem in computer vision literature due to its ill-posed nature. Motion blur
caused due to the relative motion between the camera and the object in 3D space
induces a spatially varying blurring effect over the entire image. In this
paper, we propose a novel deep filter based on Generative Adversarial Network
(GAN) architecture integrated with global skip connection and dense
architecture in order to tackle this problem. Our model, while bypassing the
process of blur kernel estimation, significantly reduces the test time which is
necessary for practical applications. The experiments on the benchmark datasets
prove the effectiveness of the proposed method which outperforms the
state-of-the-art blind deblurring algorithms both quantitatively and
qualitatively
Learning Spatial Pyramid Attentive Pooling in Image Synthesis and Image-to-Image Translation
Image synthesis and image-to-image translation are two important generative
learning tasks. Remarkable progress has been made by learning Generative
Adversarial Networks (GANs)~\cite{goodfellow2014generative} and
cycle-consistent GANs (CycleGANs)~\cite{zhu2017unpaired} respectively. This
paper presents a method of learning Spatial Pyramid Attentive Pooling (SPAP)
which is a novel architectural unit and can be easily integrated into both
generators and discriminators in GANs and CycleGANs. The proposed SPAP
integrates Atrous spatial pyramid~\cite{chen2018deeplab}, a proposed cascade
attention mechanism and residual connections~\cite{he2016deep}. It leverages
the advantages of the three components to facilitate effective end-to-end
generative learning: (i) the capability of fusing multi-scale information by
ASPP; (ii) the capability of capturing relative importance between both spatial
locations (especially multi-scale context) or feature channels by attention;
(iii) the capability of preserving information and enhancing optimization
feasibility by residual connections. Coarse-to-fine and fine-to-coarse SPAP are
studied and intriguing attention maps are observed in both tasks. In
experiments, the proposed SPAP is tested in GANs on the Celeba-HQ-128
dataset~\cite{karras2017progressive}, and tested in CycleGANs on the
Image-to-Image translation datasets including the Cityscape
dataset~\cite{cordts2016cityscapes}, Facade and Aerial Maps
dataset~\cite{zhu2017unpaired}, both obtaining better performance.Comment: 12 page
A Review on Deep Learning Techniques Applied to Semantic Segmentation
Image semantic segmentation is more and more being of interest for computer
vision and machine learning researchers. Many applications on the rise need
accurate and efficient segmentation mechanisms: autonomous driving, indoor
navigation, and even virtual or augmented reality systems to name a few. This
demand coincides with the rise of deep learning approaches in almost every
field or application target related to computer vision, including semantic
segmentation or scene understanding. This paper provides a review on deep
learning methods for semantic segmentation applied to various application
areas. Firstly, we describe the terminology of this field as well as mandatory
background concepts. Next, the main datasets and challenges are exposed to help
researchers decide which are the ones that best suit their needs and their
targets. Then, existing methods are reviewed, highlighting their contributions
and their significance in the field. Finally, quantitative results are given
for the described methods and the datasets in which they were evaluated,
following up with a discussion of the results. At last, we point out a set of
promising future works and draw our own conclusions about the state of the art
of semantic segmentation using deep learning techniques.Comment: Submitted to TPAMI on Apr. 22, 201
Generative Image Inpainting with Contextual Attention
Recent deep learning based approaches have shown promising results for the
challenging task of inpainting large missing regions in an image. These methods
can generate visually plausible image structures and textures, but often create
distorted structures or blurry textures inconsistent with surrounding areas.
This is mainly due to ineffectiveness of convolutional neural networks in
explicitly borrowing or copying information from distant spatial locations. On
the other hand, traditional texture and patch synthesis approaches are
particularly suitable when it needs to borrow textures from the surrounding
regions. Motivated by these observations, we propose a new deep generative
model-based approach which can not only synthesize novel image structures but
also explicitly utilize surrounding image features as references during network
training to make better predictions. The model is a feed-forward, fully
convolutional neural network which can process images with multiple holes at
arbitrary locations and with variable sizes during the test time. Experiments
on multiple datasets including faces (CelebA, CelebA-HQ), textures (DTD) and
natural images (ImageNet, Places2) demonstrate that our proposed approach
generates higher-quality inpainting results than existing ones. Code, demo and
models are available at: https://github.com/JiahuiYu/generative_inpainting.Comment: Accepted in CVPR 2018; add CelebA-HQ results; open sourced;
interactive demo available: http://jhyu.me/dem
Image Fine-grained Inpainting
Image inpainting techniques have shown promising improvement with the
assistance of generative adversarial networks (GANs) recently. However, most of
them often suffered from completed results with unreasonable structure or
blurriness. To mitigate this problem, in this paper, we present a one-stage
model that utilizes dense combinations of dilated convolutions to obtain larger
and more effective receptive fields. Benefited from the property of this
network, we can more easily recover large regions in an incomplete image. To
better train this efficient generator, except for frequently-used VGG feature
matching loss, we design a novel self-guided regression loss for concentrating
on uncertain areas and enhancing the semantic details. Besides, we devise a
geometrical alignment constraint item to compensate for the pixel-based
distance between prediction features and ground-truth ones. We also employ a
discriminator with local and global branches to ensure local-global contents
consistency. To further improve the quality of generated images, discriminator
feature matching on the local branch is introduced, which dynamically minimizes
the similarity of intermediate features between synthetic and ground-truth
patches. Extensive experiments on several public datasets demonstrate that our
approach outperforms current state-of-the-art methods. Code is available at
https://github.com/Zheng222/DMFN
A Deep Journey into Super-resolution: A survey
Deep convolutional networks based super-resolution is a fast-growing field
with numerous practical applications. In this exposition, we extensively
compare 30+ state-of-the-art super-resolution Convolutional Neural Networks
(CNNs) over three classical and three recently introduced challenging datasets
to benchmark single image super-resolution. We introduce a taxonomy for
deep-learning based super-resolution networks that groups existing methods into
nine categories including linear, residual, multi-branch, recursive,
progressive, attention-based and adversarial designs. We also provide
comparisons between the models in terms of network complexity, memory
footprint, model input and output, learning details, the type of network losses
and important architectural differences (e.g., depth, skip-connections,
filters). The extensive evaluation performed, shows the consistent and rapid
growth in the accuracy in the past few years along with a corresponding boost
in model complexity and the availability of large-scale datasets. It is also
observed that the pioneering methods identified as the benchmark have been
significantly outperformed by the current contenders. Despite the progress in
recent years, we identify several shortcomings of existing techniques and
provide future research directions towards the solution of these open problems.Comment: Accepted in ACM Computing Survey
PI-REC: Progressive Image Reconstruction Network With Edge and Color Domain
We propose a universal image reconstruction method to represent detailed
images purely from binary sparse edge and flat color domain. Inspired by the
procedures of painting, our framework, based on generative adversarial network,
consists of three phases: Imitation Phase aims at initializing networks,
followed by Generating Phase to reconstruct preliminary images. Moreover,
Refinement Phase is utilized to fine-tune preliminary images into final outputs
with details. This framework allows our model generating abundant high
frequency details from sparse input information. We also explore the defects of
disentangling style latent space implicitly from images, and demonstrate that
explicit color domain in our model performs better on controllability and
interpretability. In our experiments, we achieve outstanding results on
reconstructing realistic images and translating hand drawn drafts into
satisfactory paintings. Besides, within the domain of edge-to-image
translation, our model PI-REC outperforms existing state-of-the-art methods on
evaluations of realism and accuracy, both quantitatively and qualitatively.Comment: 15 pages, 13 figure
Reverse Attention for Salient Object Detection
Benefit from the quick development of deep learning techniques, salient
object detection has achieved remarkable progresses recently. However, there
still exists following two major challenges that hinder its application in
embedded devices, low resolution output and heavy model weight. To this end,
this paper presents an accurate yet compact deep network for efficient salient
object detection. More specifically, given a coarse saliency prediction in the
deepest layer, we first employ residual learning to learn side-output residual
features for saliency refinement, which can be achieved with very limited
convolutional parameters while keep accuracy. Secondly, we further propose
reverse attention to guide such side-output residual learning in a top-down
manner. By erasing the current predicted salient regions from side-output
features, the network can eventually explore the missing object parts and
details which results in high resolution and accuracy. Experiments on six
benchmark datasets demonstrate that the proposed approach compares favorably
against state-of-the-art methods, and with advantages in terms of simplicity,
efficiency (45 FPS) and model size (81 MB).Comment: ECCV 201
Lightweight Modules for Efficient Deep Learning based Image Restoration
Low level image restoration is an integral component of modern artificial
intelligence (AI) driven camera pipelines. Most of these frameworks are based
on deep neural networks which present a massive computational overhead on
resource constrained platform like a mobile phone. In this paper, we propose
several lightweight low-level modules which can be used to create a
computationally low cost variant of a given baseline model. Recent works for
efficient neural networks design have mainly focused on classification.
However, low-level image processing falls under the image-to-image' translation
genre which requires some additional computational modules not present in
classification. This paper seeks to bridge this gap by designing generic
efficient modules which can replace essential components used in contemporary
deep learning based image restoration networks. We also present and analyse our
results highlighting the drawbacks of applying depthwise separable
convolutional kernel (a popular method for efficient classification network)
for sub-pixel convolution based upsampling (a popular upsampling strategy for
low-level vision applications). This shows that concepts from domain of
classification cannot always be seamlessly integrated into image-to-image
translation tasks. We extensively validate our findings on three popular tasks
of image inpainting, denoising and super-resolution. Our results show that
proposed networks consistently output visually similar reconstructions compared
to full capacity baselines with significant reduction of parameters, memory
footprint and execution speeds on contemporary mobile devices.Comment: Accepted at: IEEE Transactions on Circuits and Systems for Video
Technology (Early Access Print) | |Codes Available at:
https://github.com/avisekiit/TCSVT-LightWeight-CNNs | Supplementary Document
at:
https://drive.google.com/file/d/1BQhkh33Sen-d0qOrjq5h8ahw2VCUIVLg/view?usp=sharin
Super-resolution reconstruction of brain magnetic resonance images via lightweight autoencoder
Magnetic Resonance Imaging (MRI) is useful to provide detailed anatomical information such as images of tissues and organs within the body that are vital for quantitative image analysis. However, typically the MR images acquired lacks adequate resolution because of the constraints such as patients’ comfort and long sampling duration. Processing the low resolution MRI may lead to an incorrect diagnosis. Therefore, there is a need for super resolution techniques to obtain high resolution MRI images. Single image super resolution (SR) is one of the popular techniques to enhance image quality. Reconstruction based SR technique is a category of single image SR that can reconstruct the low resolution MRI images to high resolution images. Inspired by the advanced deep learning based SR techniques, in this paper we propose an autoencoder based MRI image super resolution technique that performs reconstruction of the high resolution MRI images from low resolution MRI images. Experimental results on synthetic and real brain MRI images show that our autoencoder based SR technique surpasses other state-of-the-art techniques in terms of peak signal-to-noise ratio (PSNR), structural similarity (SSIM), Information Fidelity Criterion (IFC), and computational time
- …