225 research outputs found
360° Optical Flow using Tangent Images
Omnidirectional 360° images have found many promising and exciting applications in computer vision, robotics and other fields, thanks to their increasing affordability, portability and their 360° field of view. The most common format for storing, processing and visualising 360° images is equirectangular projection (ERP). However, the distortion introduced by the nonlinear mapping from 360° images to ERP images is still a barrier that holds back ERP images from being used as easily as conventional perspective images. This is especially relevant when estimating 360° optical flow, as the distortions need to be mitigated appropriately. In this paper, we propose a 360° optical flow method based on tangent images. Our method leverages gnomonic projection to locally convert ERP images to perspective images, and uniformly samples the ERP image by projection to a cubemap and regular icosahedron faces, to incrementally refine the estimated 360° flow fields even in the presence of large rotations. Our experiments demonstrate the benefits of our proposed method both quantitatively and qualitatively
Video Depth-From-Defocus
Many compelling video post-processing effects, in particular aesthetic focus editing and refocusing effects, are feasible if per-frame depth information is available. Existing computational methods to capture RGB and depth either purposefully modify the optics (coded aperture, light-field imaging), or employ active RGB-D cameras. Since these methods are less practical for users with normal cameras, we present an algorithm to capture all-in-focus RGB-D video of dynamic scenes with an unmodified commodity video camera. Our algorithm turns the often unwanted defocus blur into a valuable signal. The input to our method is a video in which the focus plane is continuously moving back and forth during capture, and thus defocus blur is provoked and strongly visible. This can be achieved by manually turning the focus ring of the lens during recording. The core algorithmic ingredient is a new video-based depth-from-defocus algorithm that computes space-time-coherent depth maps, deblurred all-in-focus video, and the focus distance for each frame. We extensively evaluate our approach, and show that it enables compelling video post-processing effects, such as different types of refocusing
Real-time Virtual Object Insertion for Moving 360° Videos
We propose an approach for real-time insertion of virtual objects into pre-recorded moving-camera 360° video. First, we reconstruct camera motion and sparse scene content via structure from motion on stitched equirectangular video. Then, to plausibly reproduce real-world lighting conditions for virtual objects, we use inverse tone mapping to recover high dynamic range environment maps which vary spatially along the camera path. We implement our approach into the Unity rendering engine for real-time virtual object insertion via differential rendering, with dynamic lighting, image-based shadowing, and user interaction. This expands the use and flexibility of 360° video for interactive computer graphics and visual effects applications
MegaParallax: Casual 360° Panoramas with Motion Parallax
The ubiquity of smart mobile devices, such as phones and tablets, enables users to casually capture 360° panoramas with a single camera sweep to share and relive experiences. However, panoramas lack motion parallax as they do not provide different views for different viewpoints. The motion parallax induced by translational head motion is a crucial depth cue in daily life. Alternatives, such as omnidirectional stereo panoramas, provide different views for each eye (binocular disparity), but they also lack motion parallax as the left and right eye panoramas are stitched statically. Methods based on explicit scene geometry reconstruct textured 3D geometry, which provides motion parallax, but suffers from visible reconstruction artefacts. The core of our method is a novel multi-perspective panorama representation, which can be casually captured and rendered with motion parallax for each eye on the fly. This provides a more realistic perception of panoramic environments which is particularly useful for virtual reality applications. Our approach uses a single consumer video camera to acquire 200–400 views of a real 360° environment with a single sweep. By using novel-view synthesis with flow-based blending, we show how to turn these input views into an enriched 360° panoramic experience that can be explored in real time, without relying on potentially unreliable reconstruction of scene geometry. We compare our results with existing omnidirectional stereo and image-based rendering methods to demonstrate the benefit of our approach, which is the first to enable casual consumers to capture and view high-quality 360° panoramas with motion parallax.This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 66599
Colour videos with depth : acquisition, processing and evaluation
The human visual system lets us perceive the world around us in three dimensions
by integrating evidence from depth cues into a coherent visual model of the world. The equivalent in computer vision and computer graphics are geometric models,
which provide a wealth of information about represented objects, such as depth and
surface normals. Videos do not contain this information, but only provide per-pixel
colour information. In this dissertation, I hence investigate a combination of videos
and geometric models: videos with per-pixel depth (also known as
RGBZ videos).
I consider the full life cycle of these videos: from their acquisition, via filtering and
processing, to stereoscopic display.
I propose two approaches to capture videos with depth. The first is a spatiotemporal
stereo matching approach based on the dual-cross-bilateral grid – a novel real-time
technique derived by accelerating a reformulation of an existing stereo matching
approach. This is the basis for an extension which incorporates temporal evidence in
real time, resulting in increased temporal coherence of disparity maps – particularly
in the presence of image noise.
The second acquisition approach is a sensor fusion system which combines data
from a noisy, low-resolution time-of-flight camera and a high-resolution colour
video camera into a coherent, noise-free video with depth. The system consists
of a three-step pipeline that aligns the video streams, efficiently removes and fills
invalid and noisy geometry, and finally uses a spatiotemporal filter to increase the
spatial resolution of the depth data and strongly reduce depth measurement noise.
I show that these videos with depth empower a range of video processing effects
that are not achievable using colour video alone. These effects critically rely on the
geometric information, like a proposed video relighting technique which requires
high-quality surface normals to produce plausible results. In addition, I demonstrate
enhanced non-photorealistic rendering techniques and the ability to synthesise
stereoscopic videos, which allows these effects to be applied stereoscopically.
These stereoscopic renderings inspired me to study stereoscopic viewing discomfort.
The result of this is a surprisingly simple computational model that predicts the
visual comfort of stereoscopic images. I validated this model using a perceptual
study, which showed that it correlates strongly with human comfort ratings. This
makes it ideal for automatic comfort assessment, without the need for costly and
lengthy perceptual studies
360MonoDepth: High-Resolution 360° Monocular Depth Estimation
360{\deg} cameras can capture complete environments in a single shot, which
makes 360{\deg} imagery alluring in many computer vision tasks. However,
monocular depth estimation remains a challenge for 360{\deg} data, particularly
for high resolutions like 2K (2048x1024) and beyond that are important for
novel-view synthesis and virtual reality applications. Current CNN-based
methods do not support such high resolutions due to limited GPU memory. In this
work, we propose a flexible framework for monocular depth estimation from
high-resolution 360{\deg} images using tangent images. We project the 360{\deg}
input image onto a set of tangent planes that produce perspective views, which
are suitable for the latest, most accurate state-of-the-art perspective
monocular depth estimators. To achieve globally consistent disparity estimates,
we recombine the individual depth estimates using deformable multi-scale
alignment followed by gradient-domain blending. The result is a dense,
high-resolution 360{\deg} depth map with a high level of detail, also for
outdoor scenes which are not supported by existing methods. Our source code and
data are available at https://manurare.github.io/360monodepth/.Comment: CVPR 2022. Project page: https://manurare.github.io/360monodepth
Real-time Global Illumination Decomposition of Videos
We propose the first approach for the decomposition of a monocular color
video into direct and indirect illumination components in real time. We
retrieve, in separate layers, the contribution made to the scene appearance by
the scene reflectance, the light sources and the reflections from various
coherent scene regions to one another. Existing techniques that invert global
light transport require image capture under multiplexed controlled lighting, or
only enable the decomposition of a single image at slow off-line frame rates.
In contrast, our approach works for regular videos and produces temporally
coherent decomposition layers at real-time frame rates. At the core of our
approach are several sparsity priors that enable the estimation of the
per-pixel direct and indirect illumination layers based on a small set of
jointly estimated base reflectance colors. The resulting variational
decomposition problem uses a new formulation based on sparse and dense sets of
non-linear equations that we solve efficiently using a novel alternating
data-parallel optimization strategy. We evaluate our approach qualitatively and
quantitatively, and show improvements over the state of the art in this field,
in both quality and runtime. In addition, we demonstrate various real-time
appearance editing applications for videos with consistent illumination
Real-time Halfway Domain Reconstruction of Motion and Geometry
We present a novel approach for real-time joint reconstruction of 3D scene motion and geometry from binocular stereo videos. Our approach is based on a novel variational halfway-domain scene flow formulation, which allows us to obtain highly accurate spatiotemporal reconstructions of shape and motion. We solve the underlying optimization problem at real-time frame rates using a novel data-parallel robust non-linear optimization strategy. Fast convergence and large displacement flows are achieved by employing a novel hierarchy that stores delta flows between hierarchy levels. High performance is obtained by the introduction of a coarser warp grid that decouples the number of unknowns from the input resolution of the images. We demonstrate our approach in a live setup that is based on two commodity webcams, as well as on publicly available video data. Our extensive experiments and evaluations show that our approach produces high-quality dense reconstructions of 3D geometry and scene flow at real-time frame rates, and compares favorably to the state of the art
ExMaps: Long-Term Localization in Dynamic Scenes using Exponential Decay
Visual camera localization using offline maps is widespread in robotics and mobile applications. Most state-of-the-art localization approaches assume static scenes, so maps are often reconstructed once and then kept constant. However, many scenes are dynamic and as changes in the scene happen, future localization attempts may struggle or fail entirely. Therefore, it is important for successful long-term localization to update and maintain maps as new observations of the scene, and changes in it, arrive. We propose a novel method for automatically discovering which points in a map remain stable over time, and which are due to transient changes. To this end, we calculate a stability store for each point based on its visibility over time, weighted by an exponential decay over time. This allows us to consider the impact of time when scoring points, and to distinguish which points are useful for long-term localization. We evaluate our method on the CMU Extended Seasons dataset (outdoors) and a new indoor dataset of a retail shop, and show the benefit of maintaining a ‘live map’ that integrates updates over time using our exponential decay based method over a static ‘base map’
- …