855 research outputs found
VORNet: Spatio-temporally Consistent Video Inpainting for Object Removal
Video object removal is a challenging task in video processing that often
requires massive human efforts. Given the mask of the foreground object in each
frame, the goal is to complete (inpaint) the object region and generate a video
without the target object. While recently deep learning based methods have
achieved great success on the image inpainting task, they often lead to
inconsistent results between frames when applied to videos. In this work, we
propose a novel learning-based Video Object Removal Network (VORNet) to solve
the video object removal task in a spatio-temporally consistent manner, by
combining the optical flow warping and image-based inpainting model.
Experiments are done on our Synthesized Video Object Removal (SVOR) dataset
based on the YouTube-VOS video segmentation dataset, and both the objective and
subjective evaluation demonstrate that our VORNet generates more spatially and
temporally consistent videos compared with existing methods.Comment: Accepted to CVPRW 201
Deep Video Inpainting
Video inpainting aims to fill spatio-temporal holes with plausible content in
a video. Despite tremendous progress of deep neural networks for image
inpainting, it is challenging to extend these methods to the video domain due
to the additional time dimension. In this work, we propose a novel deep network
architecture for fast video inpainting. Built upon an image-based
encoder-decoder model, our framework is designed to collect and refine
information from neighbor frames and synthesize still-unknown regions. At the
same time, the output is enforced to be temporally consistent by a recurrent
feedback and a temporal memory module. Compared with the state-of-the-art image
inpainting algorithm, our method produces videos that are much more
semantically correct and temporally smooth. In contrast to the prior video
completion method which relies on time-consuming optimization, our method runs
in near real-time while generating competitive video results. Finally, we
applied our framework to video retargeting task, and obtain visually pleasing
results.Comment: Accepted at CVPR 201
Frame-Recurrent Video Inpainting by Robust Optical Flow Inference
In this paper, we present a new inpainting framework for recovering missing
regions of video frames. Compared with image inpainting, performing this task
on video presents new challenges such as how to preserving temporal consistency
and spatial details, as well as how to handle arbitrary input video size and
length fast and efficiently. Towards this end, we propose a novel deep learning
architecture which incorporates ConvLSTM and optical flow for modeling the
spatial-temporal consistency in videos. It also saves much computational
resource such that our method can handle videos with larger frame size and
arbitrary length streamingly in real-time. Furthermore, to generate an accurate
optical flow from corrupted frames, we propose a robust flow generation module,
where two sources of flows are fed and a flow blending network is trained to
fuse them. We conduct extensive experiments to evaluate our method in various
scenarios and different datasets, both qualitatively and quantitatively. The
experimental results demonstrate the superior of our method compared with the
state-of-the-art inpainting approaches
Deep Blind Video Decaptioning by Temporal Aggregation and Recurrence
Blind video decaptioning is a problem of automatically removing text overlays
and inpainting the occluded parts in videos without any input masks. While
recent deep learning based inpainting methods deal with a single image and
mostly assume that the positions of the corrupted pixels are known, we aim at
automatic text removal in video sequences without mask information. In this
paper, we propose a simple yet effective framework for fast blind video
decaptioning. We construct an encoder-decoder model, where the encoder takes
multiple source frames that can provide visible pixels revealed from the scene
dynamics. These hints are aggregated and fed into the decoder. We apply a
residual connection from the input frame to the decoder output to enforce our
network to focus on the corrupted regions only. Our proposed model was ranked
in the first place in the ECCV Chalearn 2018 LAP Inpainting Competition Track2:
Video decaptioning. In addition, we further improve this strong model by
applying a recurrent feedback. The recurrent feedback not only enforces
temporal coherence but also provides strong clues on where the corrupted pixels
are. Both qualitative and quantitative experiments demonstrate that our full
model produces accurate and temporally consistent video results in real time
(50+ fps).Comment: Accepted at CVPR 201
Multi-View Inpainting for RGB-D Sequence
In this work we propose a novel approach to remove undesired objects from
RGB-D sequences captured with freely moving cameras, which enables static 3D
reconstruction. Our method jointly uses existing information from multiple
frames as well as generates new one via inpainting techniques. We use balanced
rules to select source frames; local homography based image warping method for
alignment and Markov random field (MRF) based approach for combining existing
information. For the left holes, we employ exemplar based multi-view inpainting
method to deal with the color image and coherently use it as guidance to
complete the depth correspondence. Experiments show that our approach is
qualified for removing the undesired objects and inpainting the holes.Comment: 10 page
Unsupervised Deep Context Prediction for Background Foreground Separation
In many advanced video based applications background modeling is a
pre-processing step to eliminate redundant data, for instance in tracking or
video surveillance applications. Over the past years background subtraction is
usually based on low level or hand-crafted features such as raw color
components, gradients, or local binary patterns. The background subtraction
algorithms performance suffer in the presence of various challenges such as
dynamic backgrounds, photometric variations, camera jitters, and shadows. To
handle these challenges for the purpose of accurate background modeling we
propose a unified framework based on the algorithm of image inpainting. It is
an unsupervised visual feature learning hybrid Generative Adversarial algorithm
based on context prediction. We have also presented the solution of random
region inpainting by the fusion of center region inpaiting and random region
inpainting with the help of poisson blending technique. Furthermore we also
evaluated foreground object detection with the fusion of our proposed method
and morphological operations. The comparison of our proposed method with 12
state-of-the-art methods shows its stability in the application of background
estimation and foreground detection.Comment: 17 page
Improving Video Generation for Multi-functional Applications
In this paper, we aim to improve the state-of-the-art video generative
adversarial networks (GANs) with a view towards multi-functional applications.
Our improved video GAN model does not separate foreground from background nor
dynamic from static patterns, but learns to generate the entire video clip
conjointly. Our model can thus be trained to generate - and learn from - a
broad set of videos with no restriction. This is achieved by designing a robust
one-stream video generation architecture with an extension of the
state-of-the-art Wasserstein GAN framework that allows for better convergence.
The experimental results show that our improved video GAN model outperforms
state-of-theart video generative models on multiple challenging datasets.
Furthermore, we demonstrate the superiority of our model by successfully
extending it to three challenging problems: video colorization, video
inpainting, and future prediction. To the best of our knowledge, this is the
first work using GANs to colorize and inpaint video clips
Coordinate-based Texture Inpainting for Pose-Guided Image Generation
We present a new deep learning approach to pose-guided resynthesis of human
photographs. At the heart of the new approach is the estimation of the complete
body surface texture based on a single photograph. Since the input photograph
always observes only a part of the surface, we suggest a new inpainting method
that completes the texture of the human body. Rather than working directly with
colors of texture elements, the inpainting network estimates an appropriate
source location in the input image for each element of the body surface. This
correspondence field between the input image and the texture is then further
warped into the target image coordinate frame based on the desired pose,
effectively establishing the correspondence between the source and the target
view even when the pose change is drastic. The final convolutional network then
uses the established correspondence and all other available information to
synthesize the output image. A fully-convolutional architecture with deformable
skip connections guided by the estimated correspondence field is used. We show
state-of-the-art result for pose-guided image synthesis. Additionally, we
demonstrate the performance of our system for garment transfer and pose-guided
face resynthesis.Comment: Published in Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition (CVPR). 201
On Variational Methods for Motion Compensated Inpainting
We develop in this paper a generic Bayesian framework for the joint
estimation of motion and recovery of missing data in a damaged video sequence.
Using standard maximum a posteriori to variational formulation rationale, we
derive generic minimum energy formulations for the estimation of a
reconstructed sequence as well as motion recovery. We instantiate these energy
formulations and from their Euler-Lagrange Equations, we propose a full
multiresolution algorithms in order to compute good local minimizers for our
energies and discuss their numerical implementations, focusing on the missing
data recovery part, i.e. inpainting. Experimental results for synthetic as well
as real sequences are presented. Image sequences and extra material is
available at http://image.diku.dk/francois/seqinp.php.Comment: DIKU Technical report 2009 with some small correction
Deep Inception Generative Network for Cognitive Image Inpainting
Recent advances in deep learning have shown exciting promise in filling large
holes and lead to another orientation for image inpainting. However, existing
learning-based methods often create artifacts and fallacious textures because
of insufficient cognition understanding. Previous generative networks are
limited with single receptive type and give up pooling in consideration of
detail sharpness. Human cognition is constant regardless of the target
attribute. As multiple receptive fields improve the ability of abstract image
characterization and pooling can keep feature invariant, specifically, deep
inception learning is adopted to promote high-level feature representation and
enhance model learning capacity for local patches. Moreover, approaches for
generating diverse mask images are introduced and a random mask dataset is
created. We benchmark our methods on ImageNet, Places2 dataset, and CelebA-HQ.
Experiments for regular, irregular, and custom regions completion are all
performed and free-style image inpainting is also presented. Quantitative
comparisons with previous state-of-the-art methods show that ours obtain much
more natural image completions
- …