3,009 research outputs found
Coherent Semantic Attention for Image Inpainting
The latest deep learning-based approaches have shown promising results for
the challenging task of inpainting missing regions of an image. However, the
existing methods often generate contents with blurry textures and distorted
structures due to the discontinuity of the local pixels. From a semantic-level
perspective, the local pixel discontinuity is mainly because these methods
ignore the semantic relevance and feature continuity of hole regions. To handle
this problem, we investigate the human behavior in repairing pictures and
propose a fined deep generative model-based approach with a novel coherent
semantic attention (CSA) layer, which can not only preserve contextual
structure but also make more effective predictions of missing parts by modeling
the semantic relevance between the holes features. The task is divided into
rough, refinement as two steps and model each step with a neural network under
the U-Net architecture, where the CSA layer is embedded into the encoder of
refinement step. To stabilize the network training process and promote the CSA
layer to learn more effective parameters, we propose a consistency loss to
enforce the both the CSA layer and the corresponding layer of the CSA in
decoder to be close to the VGG feature layer of a ground truth image
simultaneously. The experiments on CelebA, Places2, and Paris StreetView
datasets have validated the effectiveness of our proposed methods in image
inpainting tasks and can obtain images with a higher quality as compared with
the existing state-of-the-art approaches
Image Inpainting via Generative Multi-column Convolutional Neural Networks
In this paper, we propose a generative multi-column network for image
inpainting. This network synthesizes different image components in a parallel
manner within one stage. To better characterize global structures, we design a
confidence-driven reconstruction loss while an implicit diversified MRF
regularization is adopted to enhance local details. The multi-column network
combined with the reconstruction and MRF loss propagates local and global
information derived from context to the target inpainting regions. Extensive
experiments on challenging street view, face, natural objects and scenes
manifest that our method produces visual compelling results even without
previously common post-processing.Comment: Accepted in NIPS 201
Free-Form Image Inpainting with Gated Convolution
We present a generative image inpainting system to complete images with
free-form mask and guidance. The system is based on gated convolutions learned
from millions of images without additional labelling efforts. The proposed
gated convolution solves the issue of vanilla convolution that treats all input
pixels as valid ones, generalizes partial convolution by providing a learnable
dynamic feature selection mechanism for each channel at each spatial location
across all layers. Moreover, as free-form masks may appear anywhere in images
with any shape, global and local GANs designed for a single rectangular mask
are not applicable. Thus, we also present a patch-based GAN loss, named
SN-PatchGAN, by applying spectral-normalized discriminator on dense image
patches. SN-PatchGAN is simple in formulation, fast and stable in training.
Results on automatic image inpainting and user-guided extension demonstrate
that our system generates higher-quality and more flexible results than
previous methods. Our system helps user quickly remove distracting objects,
modify image layouts, clear watermarks and edit faces. Code, demo and models
are available at: https://github.com/JiahuiYu/generative_inpaintingComment: Accepted in ICCV 2019 Oral; open sourced; interactive demo available:
http://jiahuiyu.com/deepfill
VORNet: Spatio-temporally Consistent Video Inpainting for Object Removal
Video object removal is a challenging task in video processing that often
requires massive human efforts. Given the mask of the foreground object in each
frame, the goal is to complete (inpaint) the object region and generate a video
without the target object. While recently deep learning based methods have
achieved great success on the image inpainting task, they often lead to
inconsistent results between frames when applied to videos. In this work, we
propose a novel learning-based Video Object Removal Network (VORNet) to solve
the video object removal task in a spatio-temporally consistent manner, by
combining the optical flow warping and image-based inpainting model.
Experiments are done on our Synthesized Video Object Removal (SVOR) dataset
based on the YouTube-VOS video segmentation dataset, and both the objective and
subjective evaluation demonstrate that our VORNet generates more spatially and
temporally consistent videos compared with existing methods.Comment: Accepted to CVPRW 201
Align-and-Attend Network for Globally and Locally Coherent Video Inpainting
We propose a novel feed-forward network for video inpainting. We use a set of
sampled video frames as the reference to take visible contents to fill the hole
of a target frame. Our video inpainting network consists of two stages. The
first stage is an alignment module that uses computed homographies between the
reference frames and the target frame. The visible patches are then aggregated
based on the frame similarity to fill in the target holes roughly. The second
stage is a non-local attention module that matches the generated patches with
known reference patches (in space and time) to refine the previous global
alignment stage. Both stages consist of large spatial-temporal window size for
the reference and thus enable modeling long-range correlations between distant
information and the hole regions. Therefore, even challenging scenes with large
or slowly moving holes can be handled, which have been hardly modeled by
existing flow-based approach. Our network is also designed with a recurrent
propagation stream to encourage temporal consistency in video results.
Experiments on video object removal demonstrate that our method inpaints the
holes with globally and locally coherent contents
Nostalgin: Extracting 3D City Models from Historical Image Data
What did it feel like to walk through a city from the past? In this work, we
describe Nostalgin (Nostalgia Engine), a method that can faithfully reconstruct
cities from historical images. Unlike existing work in city reconstruction, we
focus on the task of reconstructing 3D cities from historical images. Working
with historical image data is substantially more difficult, as there are
significantly fewer buildings available and the details of the camera
parameters which captured the images are unknown. Nostalgin can generate a city
model even if there is only a single image per facade, regardless of viewpoint
or occlusions. To achieve this, our novel architecture combines image
segmentation, rectification, and inpainting. We motivate our design decisions
with experimental analysis of individual components of our pipeline, and show
that we can improve on baselines in both speed and visual realism. We
demonstrate the efficacy of our pipeline by recreating two 1940s Manhattan city
blocks. We aim to deploy Nostalgin as an open source platform where users can
generate immersive historical experiences from their own photos
Semantic Image Inpainting Through Improved Wasserstein Generative Adversarial Networks
Image inpainting is the task of filling-in missing regions of a damaged or
incomplete image. In this work we tackle this problem not only by using the
available visual data but also by incorporating image semantics through the use
of generative models. Our contribution is twofold: First, we learn a data
latent space by training an improved version of the Wasserstein generative
adversarial network, for which we incorporate a new generator and discriminator
architecture. Second, the learned semantic information is combined with a new
optimization loss for inpainting whose minimization infers the missing content
conditioned by the available data. It takes into account powerful contextual
and perceptual content inherent in the image itself. The benefits include the
ability to recover large regions by accumulating semantic information even it
is not fully present in the damaged image. Experiments show that the presented
method obtains qualitative and quantitative top-tier results in different
experimental situations and also achieves accurate photo-realism comparable to
state-of-the-art works.Comment: Accepted as Oral Presentation in VISAPP 201
Shift-Net: Image Inpainting via Deep Feature Rearrangement
Deep convolutional networks (CNNs) have exhibited their potential in image
inpainting for producing plausible results. However, in most existing methods,
e.g., context encoder, the missing parts are predicted by propagating the
surrounding convolutional features through a fully connected layer, which
intends to produce semantically plausible but blurry result. In this paper, we
introduce a special shift-connection layer to the U-Net architecture, namely
Shift-Net, for filling in missing regions of any shape with sharp structures
and fine-detailed textures. To this end, the encoder feature of the known
region is shifted to serve as an estimation of the missing parts. A guidance
loss is introduced on decoder feature to minimize the distance between the
decoder feature after fully connected layer and the ground-truth encoder
feature of the missing parts. With such constraint, the decoder feature in
missing region can be used to guide the shift of encoder feature in known
region. An end-to-end learning algorithm is further developed to train the
Shift-Net. Experiments on the Paris StreetView and Places datasets demonstrate
the efficiency and effectiveness of our Shift-Net in producing sharper,
fine-detailed, and visually plausible results. The codes and pre-trained models
are available at https://github.com/Zhaoyi-Yan/Shift-Net.Comment: 25 pages, 17 figures, 1 table, main paper + supplementary materia
Computer Vision Accelerators for Mobile Systems based on OpenCL GPGPU Co-Processing
In this paper, we present an OpenCL-based heterogeneous implementation of a
computer vision algorithm -- image inpainting-based object removal algorithm --
on mobile devices. To take advantage of the computation power of the mobile
processor, the algorithm workflow is partitioned between the CPU and the GPU
based on the profiling results on mobile devices, so that the
computationally-intensive kernels are accelerated by the mobile GPGPU
(general-purpose computing using graphics processing units). By exploring the
implementation trade-offs and utilizing the proposed optimization strategies at
different levels including algorithm optimization, parallelism optimization,
and memory access optimization, we significantly speed up the algorithm with
the CPU-GPU heterogeneous implementation, while preserving the quality of the
output images. Experimental results show that heterogeneous computing based on
GPGPU co-processing can significantly speed up the computer vision algorithms
and makes them practical on real-world mobile devices.Comment: 15 pages, 15 figures. Submitted and accepted for publication in
Journal of Signal Processing Systems, 201
Foreground-aware Image Inpainting
Existing image inpainting methods typically fill holes by borrowing
information from surrounding pixels. They often produce unsatisfactory results
when the holes overlap with or touch foreground objects due to lack of
information about the actual extent of foreground and background regions within
the holes. These scenarios, however, are very important in practice, especially
for applications such as the removal of distracting objects. To address the
problem, we propose a foreground-aware image inpainting system that explicitly
disentangles structure inference and content completion. Specifically, our
model learns to predict the foreground contour first, and then inpaints the
missing region using the predicted contour as guidance. We show that by such
disentanglement, the contour completion model predicts reasonable contours of
objects, and further substantially improves the performance of image
inpainting. Experiments show that our method significantly outperforms existing
methods and achieves superior inpainting results on challenging cases with
complex compositions.Comment: Camera Ready version of CVPR 2019 with supplementary material
- …