1,183 research outputs found
Generative Image Inpainting with Contextual Attention
Recent deep learning based approaches have shown promising results for the
challenging task of inpainting large missing regions in an image. These methods
can generate visually plausible image structures and textures, but often create
distorted structures or blurry textures inconsistent with surrounding areas.
This is mainly due to ineffectiveness of convolutional neural networks in
explicitly borrowing or copying information from distant spatial locations. On
the other hand, traditional texture and patch synthesis approaches are
particularly suitable when it needs to borrow textures from the surrounding
regions. Motivated by these observations, we propose a new deep generative
model-based approach which can not only synthesize novel image structures but
also explicitly utilize surrounding image features as references during network
training to make better predictions. The model is a feed-forward, fully
convolutional neural network which can process images with multiple holes at
arbitrary locations and with variable sizes during the test time. Experiments
on multiple datasets including faces (CelebA, CelebA-HQ), textures (DTD) and
natural images (ImageNet, Places2) demonstrate that our proposed approach
generates higher-quality inpainting results than existing ones. Code, demo and
models are available at: https://github.com/JiahuiYu/generative_inpainting.Comment: Accepted in CVPR 2018; add CelebA-HQ results; open sourced;
interactive demo available: http://jhyu.me/dem
PEPSI++: Fast and Lightweight Network for Image Inpainting
Among the various generative adversarial network (GAN)-based image inpainting
methods, a coarse-to-fine network with a contextual attention module (CAM) has
shown remarkable performance. However, owing to two stacked generative
networks, the coarse-to-fine network needs numerous computational resources
such as convolution operations and network parameters, which result in low
speed. To address this problem, we propose a novel network architecture called
PEPSI: parallel extended-decoder path for semantic inpainting network, which
aims at reducing the hardware costs and improving the inpainting performance.
PEPSI consists of a single shared encoding network and parallel decoding
networks called coarse and inpainting paths. The coarse path produces a
preliminary inpainting result to train the encoding network for the prediction
of features for the CAM. Simultaneously, the inpainting path generates higher
inpainting quality using the refined features reconstructed via the CAM. In
addition, we propose Diet-PEPSI that significantly reduces the network
parameters while maintaining the performance. In Diet-PEPSI, to capture the
global contextual information with low hardware costs, we propose novel
rate-adaptive dilated convolutional layers, which employ the common weights but
produce dynamic features depending on the given dilation rates. Extensive
experiments comparing the performance with state-of-the-art image inpainting
methods demonstrate that both PEPSI and Diet-PEPSI improve the qualitative
scores, i.e. the peak signal-to-noise ratio (PSNR) and structural similarity
(SSIM), as well as significantly reduce hardware costs such as computational
time and the number of network parameters.Comment: Accepted to IEEE transactions on Neural Networks and Learning
Systems. To be publishe
Learning Pyramid-Context Encoder Network for High-Quality Image Inpainting
High-quality image inpainting requires filling missing regions in a damaged
image with plausible content. Existing works either fill the regions by copying
image patches or generating semantically-coherent patches from region context,
while neglect the fact that both visual and semantic plausibility are
highly-demanded. In this paper, we propose a Pyramid-context ENcoder Network
(PEN-Net) for image inpainting by deep generative models. The PEN-Net is built
upon a U-Net structure, which can restore an image by encoding contextual
semantics from full resolution input, and decoding the learned semantic
features back into images. Specifically, we propose a pyramid-context encoder,
which progressively learns region affinity by attention from a high-level
semantic feature map and transfers the learned attention to the previous
low-level feature map. As the missing content can be filled by attention
transfer from deep to shallow in a pyramid fashion, both visual and semantic
coherence for image inpainting can be ensured. We further propose a multi-scale
decoder with deeply-supervised pyramid losses and an adversarial loss. Such a
design not only results in fast convergence in training, but more realistic
results in testing. Extensive experiments on various datasets show the superior
performance of the proposed networkComment: Accepted as a CVPR 2019 poster paper; update SUPP;update Eq5
Contextual Attention Mechanism, SRGAN Based Inpainting System for Eliminating Interruptions from Images
The new alternative is to use deep learning to inpaint any image by utilizing
image classification and computer vision techniques. In general, image
inpainting is a task of recreating or reconstructing any broken image which
could be a photograph or oil/acrylic painting. With the advancement in the
field of Artificial Intelligence, this topic has become popular among AI
enthusiasts. With our approach, we propose an initial end-to-end pipeline for
inpainting images using a complete Machine Learning approach instead of a
conventional application-based approach. We first use the YOLO model to
automatically identify and localize the object we wish to remove from the
image. Using the result obtained from the model we can generate a mask for the
same. After this, we provide the masked image and original image to the GAN
model which uses the Contextual Attention method to fill in the region. It
consists of two generator networks and two discriminator networks and is also
called a coarse-to-fine network structure. The two generators use fully
convolutional networks while the global discriminator gets hold of the entire
image as input while the local discriminator gets the grip of the filled region
as input. The contextual Attention mechanism is proposed to effectively borrow
the neighbor information from distant spatial locations for reconstructing the
missing pixels. The third part of our implementation uses SRGAN to resolve the
inpainted image back to its original size. Our work is inspired by the paper
Free-Form Image Inpainting with Gated Convolution and Generative Image
Inpainting with Contextual Attention
Coherent Semantic Attention for Image Inpainting
The latest deep learning-based approaches have shown promising results for
the challenging task of inpainting missing regions of an image. However, the
existing methods often generate contents with blurry textures and distorted
structures due to the discontinuity of the local pixels. From a semantic-level
perspective, the local pixel discontinuity is mainly because these methods
ignore the semantic relevance and feature continuity of hole regions. To handle
this problem, we investigate the human behavior in repairing pictures and
propose a fined deep generative model-based approach with a novel coherent
semantic attention (CSA) layer, which can not only preserve contextual
structure but also make more effective predictions of missing parts by modeling
the semantic relevance between the holes features. The task is divided into
rough, refinement as two steps and model each step with a neural network under
the U-Net architecture, where the CSA layer is embedded into the encoder of
refinement step. To stabilize the network training process and promote the CSA
layer to learn more effective parameters, we propose a consistency loss to
enforce the both the CSA layer and the corresponding layer of the CSA in
decoder to be close to the VGG feature layer of a ground truth image
simultaneously. The experiments on CelebA, Places2, and Paris StreetView
datasets have validated the effectiveness of our proposed methods in image
inpainting tasks and can obtain images with a higher quality as compared with
the existing state-of-the-art approaches
High-Resolution Image Inpainting with Iterative Confidence Feedback and Guided Upsampling
Existing image inpainting methods often produce artifacts when dealing with
large holes in real applications. To address this challenge, we propose an
iterative inpainting method with a feedback mechanism. Specifically, we
introduce a deep generative model which not only outputs an inpainting result
but also a corresponding confidence map. Using this map as feedback, it
progressively fills the hole by trusting only high-confidence pixels inside the
hole at each iteration and focuses on the remaining pixels in the next
iteration. As it reuses partial predictions from the previous iterations as
known pixels, this process gradually improves the result. In addition, we
propose a guided upsampling network to enable generation of high-resolution
inpainting results. We achieve this by extending the Contextual Attention
module to borrow high-resolution feature patches in the input image.
Furthermore, to mimic real object removal scenarios, we collect a large object
mask dataset and synthesize more realistic training data that better simulates
user inputs. Experiments show that our method significantly outperforms
existing methods in both quantitative and qualitative evaluations. More results
and Web APP are available at https://zengxianyu.github.io/iic
Semantic Image Inpainting Through Improved Wasserstein Generative Adversarial Networks
Image inpainting is the task of filling-in missing regions of a damaged or
incomplete image. In this work we tackle this problem not only by using the
available visual data but also by incorporating image semantics through the use
of generative models. Our contribution is twofold: First, we learn a data
latent space by training an improved version of the Wasserstein generative
adversarial network, for which we incorporate a new generator and discriminator
architecture. Second, the learned semantic information is combined with a new
optimization loss for inpainting whose minimization infers the missing content
conditioned by the available data. It takes into account powerful contextual
and perceptual content inherent in the image itself. The benefits include the
ability to recover large regions by accumulating semantic information even it
is not fully present in the damaged image. Experiments show that the presented
method obtains qualitative and quantitative top-tier results in different
experimental situations and also achieves accurate photo-realism comparable to
state-of-the-art works.Comment: Accepted as Oral Presentation in VISAPP 201
Attentive Generative Adversarial Network for Raindrop Removal from a Single Image
Raindrops adhered to a glass window or camera lens can severely hamper the
visibility of a background scene and degrade an image considerably. In this
paper, we address the problem by visually removing raindrops, and thus
transforming a raindrop degraded image into a clean one. The problem is
intractable, since first the regions occluded by raindrops are not given.
Second, the information about the background scene of the occluded regions is
completely lost for most part. To resolve the problem, we apply an attentive
generative network using adversarial training. Our main idea is to inject
visual attention into both the generative and discriminative networks. During
the training, our visual attention learns about raindrop regions and their
surroundings. Hence, by injecting this information, the generative network will
pay more attention to the raindrop regions and the surrounding structures, and
the discriminative network will be able to assess the local consistency of the
restored regions. This injection of visual attention to both generative and
discriminative networks is the main contribution of this paper. Our experiments
show the effectiveness of our approach, which outperforms the state of the art
methods quantitatively and qualitatively.Comment: CVPR2018 Spotligh
Void Filling of Digital Elevation Models with Deep Generative Models
In recent years, advances in machine learning algorithms, cheap computational
resources, and the availability of big data have spurred the deep learning
revolution in various application domains. In particular, supervised learning
techniques in image analysis have led to superhuman performance in various
tasks, such as classification, localization, and segmentation, while
unsupervised learning techniques based on increasingly advanced generative
models have been applied to generate high-resolution synthetic images
indistinguishable from real images.
In this paper we consider a state-of-the-art machine learning model for image
inpainting, namely a Wasserstein Generative Adversarial Network based on a
fully convolutional architecture with a contextual attention mechanism. We show
that this model can successfully be transferred to the setting of digital
elevation models (DEMs) for the purpose of generating semantically plausible
data for filling voids. Training, testing and experimentation is done on
GeoTIFF data from various regions in Norway, made openly available by the
Norwegian Mapping Authority.Comment: 5 pages; 4 figures; corrected names in references; clarifications
regarding the two generators in the paper; added reference (Borji 2018) on
GAN evaluation measures; extended future work discussion; changed (Fig. 4.f)
to show a failure cas
Learning Symmetry Consistent Deep CNNs for Face Completion
Deep convolutional networks (CNNs) have achieved great success in face
completion to generate plausible facial structures. These methods, however, are
limited in maintaining global consistency among face components and recovering
fine facial details. On the other hand, reflectional symmetry is a prominent
property of face image and benefits face recognition and consistency modeling,
yet remaining uninvestigated in deep face completion. In this work, we leverage
two kinds of symmetry-enforcing subnets to form a symmetry-consistent CNN model
(i.e., SymmFCNet) for effective face completion. For missing pixels on only one
of the half-faces, an illumination-reweighted warping subnet is developed to
guide the warping and illumination reweighting of the other half-face. As for
missing pixels on both of half-faces, we present a generative reconstruction
subnet together with a perceptual symmetry loss to enforce symmetry consistency
of recovered structures. The SymmFCNet is constructed by stacking generative
reconstruction subnet upon illumination-reweighted warping subnet, and can be
end-to-end learned from training set of unaligned face images. Experiments show
that SymmFCNet can generate high quality results on images with synthetic and
real occlusion, and performs favorably against state-of-the-arts
- …