556 research outputs found
Reimagining Reality: A Comprehensive Survey of Video Inpainting Techniques
This paper offers a comprehensive analysis of recent advancements in video
inpainting techniques, a critical subset of computer vision and artificial
intelligence. As a process that restores or fills in missing or corrupted
portions of video sequences with plausible content, video inpainting has
evolved significantly with the advent of deep learning methodologies. Despite
the plethora of existing methods and their swift development, the landscape
remains complex, posing challenges to both novices and established researchers.
Our study deconstructs major techniques, their underpinning theories, and their
effective applications. Moreover, we conduct an exhaustive comparative study,
centering on two often-overlooked dimensions: visual quality and computational
efficiency. We adopt a human-centric approach to assess visual quality,
enlisting a panel of annotators to evaluate the output of different video
inpainting techniques. This provides a nuanced qualitative understanding that
complements traditional quantitative metrics. Concurrently, we delve into the
computational aspects, comparing inference times and memory demands across a
standardized hardware setup. This analysis underscores the balance between
quality and efficiency: a critical consideration for practical applications
where resources may be constrained. By integrating human validation and
computational resource comparison, this survey not only clarifies the present
landscape of video inpainting techniques but also charts a course for future
explorations in this vibrant and evolving field
Large-Scale Light Field Capture and Reconstruction
This thesis discusses approaches and techniques to convert Sparsely-Sampled Light Fields (SSLFs) into Densely-Sampled Light Fields (DSLFs), which can be used for visualization on 3DTV and Virtual Reality (VR) devices. Exemplarily, a movable 1D large-scale light field acquisition system for capturing SSLFs in real-world environments is evaluated. This system consists of 24 sparsely placed RGB cameras and two Kinect V2 sensors. The real-world SSLF data captured with this setup can be leveraged to reconstruct real-world DSLFs. To this end, three challenging problems require to be solved for this system: (i) how to estimate the rigid transformation from the coordinate system of a Kinect V2 to the coordinate system of an RGB camera; (ii) how to register the two Kinect V2 sensors with a large displacement; (iii) how to reconstruct a DSLF from a SSLF with moderate and large disparity ranges. To overcome these three challenges, we propose: (i) a novel self-calibration method, which takes advantage of the geometric constraints from the scene and the cameras, for estimating the rigid transformations from the camera coordinate frame of one Kinect V2 to the camera coordinate frames of 12-nearest RGB cameras; (ii) a novel coarse-to-fine approach for recovering the rigid transformation from the coordinate system of one Kinect to the coordinate system of the other by means of local color and geometry information; (iii) several novel algorithms that can be categorized into two groups for reconstructing a DSLF from an input SSLF, including novel view synthesis methods, which are inspired by the state-of-the-art video frame interpolation algorithms, and Epipolar-Plane Image (EPI) inpainting methods, which are inspired by the Shearlet Transform (ST)-based DSLF reconstruction approaches
Recommended from our members
A Variational Model for Joint Motion Estimation and Image Reconstruction
The aim of this paper is to derive and analyze a variational model for the joint estimation of motion and reconstruction of image sequences, which is based on a time-continuous Eulerian motion model. The model can be set up in terms of the continuity equation or the brightness constancy equation. The analysis in this paper focuses on the latter for robust motion estimation on sequences of twodimensional images. We rigorously prove the existence of a minimizer in a suitable function space setting. Moreover, we discuss the numerical solution of the model based on primal-dual algorithms and investigate several examples. Finally, the benefits of our model compared to existing techniques, such as sequential image reconstruction and motion estimation, are shown.The work of the first author was also supported by the German
Science Foundation DFG via EXC 1003 Cells in Motion Cluster of Excellence, M¨unster, German
S-NeRF++: Autonomous Driving Simulation via Neural Reconstruction and Generation
Autonomous driving simulation system plays a crucial role in enhancing
self-driving data and simulating complex and rare traffic scenarios, ensuring
navigation safety. However, traditional simulation systems, which often heavily
rely on manual modeling and 2D image editing, struggled with scaling to
extensive scenes and generating realistic simulation data. In this study, we
present S-NeRF++, an innovative autonomous driving simulation system based on
neural reconstruction. Trained on widely-used self-driving datasets such as
nuScenes and Waymo, S-NeRF++ can generate a large number of realistic street
scenes and foreground objects with high rendering quality as well as offering
considerable flexibility in manipulation and simulation. Specifically, S-NeRF++
is an enhanced neural radiance field for synthesizing large-scale scenes and
moving vehicles, with improved scene parameterization and camera pose learning.
The system effectively utilizes noisy and sparse LiDAR data to refine training
and address depth outliers, ensuring high quality reconstruction and novel-view
rendering. It also provides a diverse foreground asset bank through
reconstructing and generating different foreground vehicles to support
comprehensive scenario creation. Moreover, we have developed an advanced
foreground-background fusion pipeline that skillfully integrates illumination
and shadow effects, further enhancing the realism of our simulations. With the
high-quality simulated data provided by our S-NeRF++, we found the perception
methods enjoy performance boost on several autonomous driving downstream tasks,
which further demonstrate the effectiveness of our proposed simulator
Learning Joint Spatial-Temporal Transformations for Video Inpainting
High-quality video inpainting that completes missing regions in video frames
is a promising yet challenging task. State-of-the-art approaches adopt
attention models to complete a frame by searching missing contents from
reference frames, and further complete whole videos frame by frame. However,
these approaches can suffer from inconsistent attention results along spatial
and temporal dimensions, which often leads to blurriness and temporal artifacts
in videos. In this paper, we propose to learn a joint Spatial-Temporal
Transformer Network (STTN) for video inpainting. Specifically, we
simultaneously fill missing regions in all input frames by self-attention, and
propose to optimize STTN by a spatial-temporal adversarial loss. To show the
superiority of the proposed model, we conduct both quantitative and qualitative
evaluations by using standard stationary masks and more realistic moving object
masks. Demo videos are available at https://github.com/researchmm/STTN.Comment: Accepted by ECCV202
- …