122 research outputs found
Street View Motion-from-Structure-from-Motion
We describe a structure-from-motion framework that handles “generalized ” cameras, such as moving rolling-shutter cameras, and works at an unprecedented scale— billions of images covering millions of linear kilometers of roads—by exploiting a good relative pose prior along vehicle paths. We exhibit a planet-scale, appearance-augmented point cloud constructed with our framework and demonstrate its practical use in correcting the pose of a street-level image collection. 1
Towards High-Frequency Tracking and Fast Edge-Aware Optimization
This dissertation advances the state of the art for AR/VR tracking systems by
increasing the tracking frequency by orders of magnitude and proposes an
efficient algorithm for the problem of edge-aware optimization.
AR/VR is a natural way of interacting with computers, where the physical and
digital worlds coexist. We are on the cusp of a radical change in how humans
perform and interact with computing. Humans are sensitive to small
misalignments between the real and the virtual world, and tracking at
kilo-Hertz frequencies becomes essential. Current vision-based systems fall
short, as their tracking frequency is implicitly limited by the frame-rate of
the camera. This thesis presents a prototype system which can track at orders
of magnitude higher than the state-of-the-art methods using multiple commodity
cameras. The proposed system exploits characteristics of the camera
traditionally considered as flaws, namely rolling shutter and radial
distortion. The experimental evaluation shows the effectiveness of the method
for various degrees of motion.
Furthermore, edge-aware optimization is an indispensable tool in the computer
vision arsenal for accurate filtering of depth-data and image-based rendering,
which is increasingly being used for content creation and geometry processing
for AR/VR. As applications increasingly demand higher resolution and speed,
there exists a need to develop methods that scale accordingly. This
dissertation proposes such an edge-aware optimization framework which is
efficient, accurate, and algorithmically scales well, all of which are much
desirable traits not found jointly in the state of the art. The experiments
show the effectiveness of the framework in a multitude of computer vision tasks
such as computational photography and stereo.Comment: PhD thesi
Structure and motion estimation from rolling shutter video
The majority of consumer quality cameras sold today have CMOS sensors with rolling shutters. In a rolling shutter camera, images are read out row by row, and thus each row is exposed during a different time interval. A rolling-shutter exposure causes geometric image distortions when either the camera or the scene is moving, and this causes state-of-the-art structure and motion algorithms to fail. We demonstrate a novel method for solving the structure and motion problem for rolling-shutter video. The method relies on exploiting the continuity of the camera motion, both between frames, and across a frame. We demonstrate the effectiveness of our method by controlled experiments on real video sequences. We show, both visually and quantitatively, that our method outperforms standard structure and motion, and is more accurate and efficient than a two-step approach, doing image rectification and structure and motion
Self-Supervised Scene Dynamic Recovery from Rolling Shutter Images and Events
Scene Dynamic Recovery (SDR) by inverting distorted Rolling Shutter (RS)
images to an undistorted high frame-rate Global Shutter (GS) video is a
severely ill-posed problem, particularly when prior knowledge about
camera/object motions is unavailable. Commonly used artificial assumptions on
motion linearity and data-specific characteristics, regarding the temporal
dynamics information embedded in the RS scanlines, are prone to producing
sub-optimal solutions in real-world scenarios. To address this challenge, we
propose an event-based RS2GS framework within a self-supervised learning
paradigm that leverages the extremely high temporal resolution of event cameras
to provide accurate inter/intra-frame information. % In this paper, we propose
to leverage the event camera to provide inter/intra-frame information as the
emitted events have an extremely high temporal resolution and learn an
event-based RS2GS network within a self-supervised learning framework, where
real-world events and RS images can be exploited to alleviate the performance
degradation caused by the domain gap between the synthesized and real data.
Specifically, an Event-based Inter/intra-frame Compensator (E-IC) is proposed
to predict the per-pixel dynamic between arbitrary time intervals, including
the temporal transition and spatial translation. Exploring connections in terms
of RS-RS, RS-GS, and GS-RS, we explicitly formulate mutual constraints with the
proposed E-IC, resulting in supervisions without ground-truth GS images.
Extensive evaluations over synthetic and real datasets demonstrate that the
proposed method achieves state-of-the-art and shows remarkable performance for
event-based RS2GS inversion in real-world scenarios. The dataset and code are
available at https://w3un.github.io/selfunroll/
- …