56 research outputs found
Towards Nonlinear-Motion-Aware and Occlusion-Robust Rolling Shutter Correction
This paper addresses the problem of rolling shutter correction in complex
nonlinear and dynamic scenes with extreme occlusion. Existing methods suffer
from two main drawbacks. Firstly, they face challenges in estimating the
accurate correction field due to the uniform velocity assumption, leading to
significant image correction errors under complex motion. Secondly, the drastic
occlusion in dynamic scenes prevents current solutions from achieving better
image quality because of the inherent difficulties in aligning and aggregating
multiple frames. To tackle these challenges, we model the curvilinear
trajectory of pixels analytically and propose a geometry-based Quadratic
Rolling Shutter (QRS) motion solver, which precisely estimates the high-order
correction field of individual pixels. Besides, to reconstruct high-quality
occlusion frames in dynamic scenes, we present a 3D video architecture that
effectively Aligns and Aggregates multi-frame context, namely, RSA2-Net. We
evaluate our method across a broad range of cameras and video sequences,
demonstrating its significant superiority. Specifically, our method surpasses
the state-of-the-art by +4.98, +0.77, and +4.33 of PSNR on Carla-RS, Fastec-RS,
and BS-RSC datasets, respectively. Code is available at
https://github.com/DelinQu/qrsc.Comment: accepted at ICCV 202
Self-Supervised Scene Dynamic Recovery from Rolling Shutter Images and Events
Scene Dynamic Recovery (SDR) by inverting distorted Rolling Shutter (RS)
images to an undistorted high frame-rate Global Shutter (GS) video is a
severely ill-posed problem, particularly when prior knowledge about
camera/object motions is unavailable. Commonly used artificial assumptions on
motion linearity and data-specific characteristics, regarding the temporal
dynamics information embedded in the RS scanlines, are prone to producing
sub-optimal solutions in real-world scenarios. To address this challenge, we
propose an event-based RS2GS framework within a self-supervised learning
paradigm that leverages the extremely high temporal resolution of event cameras
to provide accurate inter/intra-frame information. % In this paper, we propose
to leverage the event camera to provide inter/intra-frame information as the
emitted events have an extremely high temporal resolution and learn an
event-based RS2GS network within a self-supervised learning framework, where
real-world events and RS images can be exploited to alleviate the performance
degradation caused by the domain gap between the synthesized and real data.
Specifically, an Event-based Inter/intra-frame Compensator (E-IC) is proposed
to predict the per-pixel dynamic between arbitrary time intervals, including
the temporal transition and spatial translation. Exploring connections in terms
of RS-RS, RS-GS, and GS-RS, we explicitly formulate mutual constraints with the
proposed E-IC, resulting in supervisions without ground-truth GS images.
Extensive evaluations over synthetic and real datasets demonstrate that the
proposed method achieves state-of-the-art and shows remarkable performance for
event-based RS2GS inversion in real-world scenarios. The dataset and code are
available at https://w3un.github.io/selfunroll/
Towards High-Frequency Tracking and Fast Edge-Aware Optimization
This dissertation advances the state of the art for AR/VR tracking systems by
increasing the tracking frequency by orders of magnitude and proposes an
efficient algorithm for the problem of edge-aware optimization.
AR/VR is a natural way of interacting with computers, where the physical and
digital worlds coexist. We are on the cusp of a radical change in how humans
perform and interact with computing. Humans are sensitive to small
misalignments between the real and the virtual world, and tracking at
kilo-Hertz frequencies becomes essential. Current vision-based systems fall
short, as their tracking frequency is implicitly limited by the frame-rate of
the camera. This thesis presents a prototype system which can track at orders
of magnitude higher than the state-of-the-art methods using multiple commodity
cameras. The proposed system exploits characteristics of the camera
traditionally considered as flaws, namely rolling shutter and radial
distortion. The experimental evaluation shows the effectiveness of the method
for various degrees of motion.
Furthermore, edge-aware optimization is an indispensable tool in the computer
vision arsenal for accurate filtering of depth-data and image-based rendering,
which is increasingly being used for content creation and geometry processing
for AR/VR. As applications increasingly demand higher resolution and speed,
there exists a need to develop methods that scale accordingly. This
dissertation proposes such an edge-aware optimization framework which is
efficient, accurate, and algorithmically scales well, all of which are much
desirable traits not found jointly in the state of the art. The experiments
show the effectiveness of the framework in a multitude of computer vision tasks
such as computational photography and stereo.Comment: PhD thesi
Self-supervised Learning of Event-guided Video Frame Interpolation for Rolling Shutter Frames
This paper makes the first attempt to tackle the challenging task of
recovering arbitrary frame rate latent global shutter (GS) frames from two
consecutive rolling shutter (RS) frames, guided by the novel event camera data.
Although events possess high temporal resolution, beneficial for video frame
interpolation (VFI), a hurdle in tackling this task is the lack of paired GS
frames. Another challenge is that RS frames are susceptible to distortion when
capturing moving objects. To this end, we propose a novel self-supervised
framework that leverages events to guide RS frame correction and VFI in a
unified framework. Our key idea is to estimate the displacement field (DF)
non-linear dense 3D spatiotemporal information of all pixels during the
exposure time, allowing for the reciprocal reconstruction between RS and GS
frames as well as arbitrary frame rate VFI. Specifically, the displacement
field estimation (DFE) module is proposed to estimate the spatiotemporal motion
from events to correct the RS distortion and interpolate the GS frames in one
step. We then combine the input RS frames and DF to learn a mapping for
RS-to-GS frame interpolation. However, as the mapping is highly
under-constrained, we couple it with an inverse mapping (i.e., GS-to-RS) and RS
frame warping (i.e., RS-to-RS) for self-supervision. As there is a lack of
labeled datasets for evaluation, we generate two synthetic datasets and collect
a real-world dataset to train and test our method. Experimental results show
that our method yields comparable or better performance with prior supervised
methods.Comment: This paper has been submitted for review in March 202
- …