2 research outputs found
Align-and-Attend Network for Globally and Locally Coherent Video Inpainting
We propose a novel feed-forward network for video inpainting. We use a set of
sampled video frames as the reference to take visible contents to fill the hole
of a target frame. Our video inpainting network consists of two stages. The
first stage is an alignment module that uses computed homographies between the
reference frames and the target frame. The visible patches are then aggregated
based on the frame similarity to fill in the target holes roughly. The second
stage is a non-local attention module that matches the generated patches with
known reference patches (in space and time) to refine the previous global
alignment stage. Both stages consist of large spatial-temporal window size for
the reference and thus enable modeling long-range correlations between distant
information and the hole regions. Therefore, even challenging scenes with large
or slowly moving holes can be handled, which have been hardly modeled by
existing flow-based approach. Our network is also designed with a recurrent
propagation stream to encourage temporal consistency in video results.
Experiments on video object removal demonstrate that our method inpaints the
holes with globally and locally coherent contents
Dynamic Object Removal and Spatio-Temporal RGB-D Inpainting via Geometry-Aware Adversarial Learning
Dynamic objects have a significant impact on the robot's perception of the
environment which degrades the performance of essential tasks such as
localization and mapping. In this work, we address this problem by synthesizing
plausible color, texture and geometry in regions occluded by dynamic objects.
We propose the novel geometry-aware DynaFill architecture that follows a
coarse-to-fine topology and incorporates our gated recurrent feedback mechanism
to adaptively fuse information from previous timesteps. We optimize our
architecture using adversarial training to synthesize fine realistic textures
which enables it to hallucinate color and depth structure in occluded regions
online in a spatially and temporally coherent manner, without relying on future
frame information. Casting our inpainting problem as an image-to-image
translation task, our model also corrects regions correlated with the presence
of dynamic objects in the scene, such as shadows or reflections. We introduce a
large-scale hyperrealistic dataset with RGB-D images, semantic segmentation
labels, camera poses as well as groundtruth RGB-D information of occluded
regions. Extensive quantitative and qualitative evaluations show that our
approach achieves state-of-the-art performance, even in challenging weather
conditions. Furthermore, we present results for retrieval-based visual
localization with the synthesized images that demonstrate the utility of our
approach.Comment: Dataset, code and models are available at
http://rl.uni-freiburg.de/research/rgbd-inpaintin