19,028 research outputs found
Coherent Semantic Attention for Image Inpainting
The latest deep learning-based approaches have shown promising results for
the challenging task of inpainting missing regions of an image. However, the
existing methods often generate contents with blurry textures and distorted
structures due to the discontinuity of the local pixels. From a semantic-level
perspective, the local pixel discontinuity is mainly because these methods
ignore the semantic relevance and feature continuity of hole regions. To handle
this problem, we investigate the human behavior in repairing pictures and
propose a fined deep generative model-based approach with a novel coherent
semantic attention (CSA) layer, which can not only preserve contextual
structure but also make more effective predictions of missing parts by modeling
the semantic relevance between the holes features. The task is divided into
rough, refinement as two steps and model each step with a neural network under
the U-Net architecture, where the CSA layer is embedded into the encoder of
refinement step. To stabilize the network training process and promote the CSA
layer to learn more effective parameters, we propose a consistency loss to
enforce the both the CSA layer and the corresponding layer of the CSA in
decoder to be close to the VGG feature layer of a ground truth image
simultaneously. The experiments on CelebA, Places2, and Paris StreetView
datasets have validated the effectiveness of our proposed methods in image
inpainting tasks and can obtain images with a higher quality as compared with
the existing state-of-the-art approaches
Free-Form Image Inpainting with Gated Convolution
We present a generative image inpainting system to complete images with
free-form mask and guidance. The system is based on gated convolutions learned
from millions of images without additional labelling efforts. The proposed
gated convolution solves the issue of vanilla convolution that treats all input
pixels as valid ones, generalizes partial convolution by providing a learnable
dynamic feature selection mechanism for each channel at each spatial location
across all layers. Moreover, as free-form masks may appear anywhere in images
with any shape, global and local GANs designed for a single rectangular mask
are not applicable. Thus, we also present a patch-based GAN loss, named
SN-PatchGAN, by applying spectral-normalized discriminator on dense image
patches. SN-PatchGAN is simple in formulation, fast and stable in training.
Results on automatic image inpainting and user-guided extension demonstrate
that our system generates higher-quality and more flexible results than
previous methods. Our system helps user quickly remove distracting objects,
modify image layouts, clear watermarks and edit faces. Code, demo and models
are available at: https://github.com/JiahuiYu/generative_inpaintingComment: Accepted in ICCV 2019 Oral; open sourced; interactive demo available:
http://jiahuiyu.com/deepfill
High-Resolution Image Inpainting with Iterative Confidence Feedback and Guided Upsampling
Existing image inpainting methods often produce artifacts when dealing with
large holes in real applications. To address this challenge, we propose an
iterative inpainting method with a feedback mechanism. Specifically, we
introduce a deep generative model which not only outputs an inpainting result
but also a corresponding confidence map. Using this map as feedback, it
progressively fills the hole by trusting only high-confidence pixels inside the
hole at each iteration and focuses on the remaining pixels in the next
iteration. As it reuses partial predictions from the previous iterations as
known pixels, this process gradually improves the result. In addition, we
propose a guided upsampling network to enable generation of high-resolution
inpainting results. We achieve this by extending the Contextual Attention
module to borrow high-resolution feature patches in the input image.
Furthermore, to mimic real object removal scenarios, we collect a large object
mask dataset and synthesize more realistic training data that better simulates
user inputs. Experiments show that our method significantly outperforms
existing methods in both quantitative and qualitative evaluations. More results
and Web APP are available at https://zengxianyu.github.io/iic
Learning Pyramid-Context Encoder Network for High-Quality Image Inpainting
High-quality image inpainting requires filling missing regions in a damaged
image with plausible content. Existing works either fill the regions by copying
image patches or generating semantically-coherent patches from region context,
while neglect the fact that both visual and semantic plausibility are
highly-demanded. In this paper, we propose a Pyramid-context ENcoder Network
(PEN-Net) for image inpainting by deep generative models. The PEN-Net is built
upon a U-Net structure, which can restore an image by encoding contextual
semantics from full resolution input, and decoding the learned semantic
features back into images. Specifically, we propose a pyramid-context encoder,
which progressively learns region affinity by attention from a high-level
semantic feature map and transfers the learned attention to the previous
low-level feature map. As the missing content can be filled by attention
transfer from deep to shallow in a pyramid fashion, both visual and semantic
coherence for image inpainting can be ensured. We further propose a multi-scale
decoder with deeply-supervised pyramid losses and an adversarial loss. Such a
design not only results in fast convergence in training, but more realistic
results in testing. Extensive experiments on various datasets show the superior
performance of the proposed networkComment: Accepted as a CVPR 2019 poster paper; update SUPP;update Eq5
EdgeConnect: Generative Image Inpainting with Adversarial Edge Learning
Over the last few years, deep learning techniques have yielded significant
improvements in image inpainting. However, many of these techniques fail to
reconstruct reasonable structures as they are commonly over-smoothed and/or
blurry. This paper develops a new approach for image inpainting that does a
better job of reproducing filled regions exhibiting fine details. We propose a
two-stage adversarial model EdgeConnect that comprises of an edge generator
followed by an image completion network. The edge generator hallucinates edges
of the missing region (both regular and irregular) of the image, and the image
completion network fills in the missing regions using hallucinated edges as a
priori. We evaluate our model end-to-end over the publicly available datasets
CelebA, Places2, and Paris StreetView, and show that it outperforms current
state-of-the-art techniques quantitatively and qualitatively. Code and models
available at: https://github.com/knazeri/edge-connectComment: Code and data: https://github.com/knazeri/edge-connec
PEPSI++: Fast and Lightweight Network for Image Inpainting
Among the various generative adversarial network (GAN)-based image inpainting
methods, a coarse-to-fine network with a contextual attention module (CAM) has
shown remarkable performance. However, owing to two stacked generative
networks, the coarse-to-fine network needs numerous computational resources
such as convolution operations and network parameters, which result in low
speed. To address this problem, we propose a novel network architecture called
PEPSI: parallel extended-decoder path for semantic inpainting network, which
aims at reducing the hardware costs and improving the inpainting performance.
PEPSI consists of a single shared encoding network and parallel decoding
networks called coarse and inpainting paths. The coarse path produces a
preliminary inpainting result to train the encoding network for the prediction
of features for the CAM. Simultaneously, the inpainting path generates higher
inpainting quality using the refined features reconstructed via the CAM. In
addition, we propose Diet-PEPSI that significantly reduces the network
parameters while maintaining the performance. In Diet-PEPSI, to capture the
global contextual information with low hardware costs, we propose novel
rate-adaptive dilated convolutional layers, which employ the common weights but
produce dynamic features depending on the given dilation rates. Extensive
experiments comparing the performance with state-of-the-art image inpainting
methods demonstrate that both PEPSI and Diet-PEPSI improve the qualitative
scores, i.e. the peak signal-to-noise ratio (PSNR) and structural similarity
(SSIM), as well as significantly reduce hardware costs such as computational
time and the number of network parameters.Comment: Accepted to IEEE transactions on Neural Networks and Learning
Systems. To be publishe
Context-Aware Semantic Inpainting
Recently image inpainting has witnessed rapid progress due to generative
adversarial networks (GAN) that are able to synthesize realistic contents.
However, most existing GAN-based methods for semantic inpainting apply an
auto-encoder architecture with a fully connected layer, which cannot accurately
maintain spatial information. In addition, the discriminator in existing GANs
struggle to understand high-level semantics within the image context and yield
semantically consistent content. Existing evaluation criteria are biased
towards blurry results and cannot well characterize edge preservation and
visual authenticity in the inpainting results. In this paper, we propose an
improved generative adversarial network to overcome the aforementioned
limitations. Our proposed GAN-based framework consists of a fully convolutional
design for the generator which helps to better preserve spatial structures and
a joint loss function with a revised perceptual loss to capture high-level
semantics in the context. Furthermore, we also introduce two novel measures to
better assess the quality of image inpainting results. Experimental results
demonstrate that our method outperforms the state of the art under a wide range
of criteria
Deep Inception Generative Network for Cognitive Image Inpainting
Recent advances in deep learning have shown exciting promise in filling large
holes and lead to another orientation for image inpainting. However, existing
learning-based methods often create artifacts and fallacious textures because
of insufficient cognition understanding. Previous generative networks are
limited with single receptive type and give up pooling in consideration of
detail sharpness. Human cognition is constant regardless of the target
attribute. As multiple receptive fields improve the ability of abstract image
characterization and pooling can keep feature invariant, specifically, deep
inception learning is adopted to promote high-level feature representation and
enhance model learning capacity for local patches. Moreover, approaches for
generating diverse mask images are introduced and a random mask dataset is
created. We benchmark our methods on ImageNet, Places2 dataset, and CelebA-HQ.
Experiments for regular, irregular, and custom regions completion are all
performed and free-style image inpainting is also presented. Quantitative
comparisons with previous state-of-the-art methods show that ours obtain much
more natural image completions
TE141K: Artistic Text Benchmark for Text Effect Transfer
Text effects are combinations of visual elements such as outlines, colors and
textures of text, which can dramatically improve its artistry. Although text
effects are extensively utilized in the design industry, they are usually
created by human experts due to their extreme complexity; this is laborious and
not practical for normal users. In recent years, some efforts have been made
toward automatic text effect transfer; however, the lack of data limits the
capabilities of transfer models. To address this problem, we introduce a new
text effects dataset, TE141K, with 141,081 text effect/glyph pairs in total.
Our dataset consists of 152 professionally designed text effects rendered on
glyphs, including English letters, Chinese characters, and Arabic numerals. To
the best of our knowledge, this is the largest dataset for text effect transfer
to date. Based on this dataset, we propose a baseline approach called text
effect transfer GAN (TET-GAN), which supports the transfer of all 152 styles in
one model and can efficiently extend to new styles. Finally, we conduct a
comprehensive comparison in which 14 style transfer models are benchmarked.
Experimental results demonstrate the superiority of TET-GAN both qualitatively
and quantitatively and indicate that our dataset is effective and challenging.Comment: Accepted by TPAMI 2020. Project page:
https://daooshee.github.io/TE141K
Decompose to manipulate: Manipulable Object Synthesis in 3D Medical Images with Structured Image Decomposition
The performance of medical image analysis systems is constrained by the
quantity of high-quality image annotations. Such systems require data to be
annotated by experts with years of training, especially when diagnostic
decisions are involved. Such datasets are thus hard to scale up. In this
context, it is hard for supervised learning systems to generalize to the cases
that are rare in the training set but would be present in real-world clinical
practices. We believe that the synthetic image samples generated by a system
trained on the real data can be useful for improving the supervised learning
tasks in the medical image analysis applications. Allowing the image synthesis
to be manipulable could help synthetic images provide complementary information
to the training data rather than simply duplicating the real-data manifold. In
this paper, we propose a framework for synthesizing 3D objects, such as
pulmonary nodules, in 3D medical images with manipulable properties. The
manipulation is enabled by decomposing of the object of interests into its
segmentation mask and a 1D vector containing the residual information. The
synthetic object is refined and blended into the image context with two
adversarial discriminators. We evaluate the proposed framework on lung nodules
in 3D chest CT images and show that the proposed framework could generate
realistic nodules with manipulable shapes, textures and locations, etc. By
sampling from both the synthetic nodules and the real nodules from 2800 3D CT
volumes during the classifier training, we show the synthetic patches could
improve the overall nodule detection performance by average 8.44% competition
performance metric (CPM) score
- …