225 research outputs found

    360° Optical Flow using Tangent Images

    Get PDF
    Omnidirectional 360° images have found many promising and exciting applications in computer vision, robotics and other fields, thanks to their increasing affordability, portability and their 360° field of view. The most common format for storing, processing and visualising 360° images is equirectangular projection (ERP). However, the distortion introduced by the nonlinear mapping from 360° images to ERP images is still a barrier that holds back ERP images from being used as easily as conventional perspective images. This is especially relevant when estimating 360° optical flow, as the distortions need to be mitigated appropriately. In this paper, we propose a 360° optical flow method based on tangent images. Our method leverages gnomonic projection to locally convert ERP images to perspective images, and uniformly samples the ERP image by projection to a cubemap and regular icosahedron faces, to incrementally refine the estimated 360° flow fields even in the presence of large rotations. Our experiments demonstrate the benefits of our proposed method both quantitatively and qualitatively

    Video Depth-From-Defocus

    Get PDF
    Many compelling video post-processing effects, in particular aesthetic focus editing and refocusing effects, are feasible if per-frame depth information is available. Existing computational methods to capture RGB and depth either purposefully modify the optics (coded aperture, light-field imaging), or employ active RGB-D cameras. Since these methods are less practical for users with normal cameras, we present an algorithm to capture all-in-focus RGB-D video of dynamic scenes with an unmodified commodity video camera. Our algorithm turns the often unwanted defocus blur into a valuable signal. The input to our method is a video in which the focus plane is continuously moving back and forth during capture, and thus defocus blur is provoked and strongly visible. This can be achieved by manually turning the focus ring of the lens during recording. The core algorithmic ingredient is a new video-based depth-from-defocus algorithm that computes space-time-coherent depth maps, deblurred all-in-focus video, and the focus distance for each frame. We extensively evaluate our approach, and show that it enables compelling video post-processing effects, such as different types of refocusing

    Real-time Virtual Object Insertion for Moving 360° Videos

    Get PDF
    We propose an approach for real-time insertion of virtual objects into pre-recorded moving-camera 360° video. First, we reconstruct camera motion and sparse scene content via structure from motion on stitched equirectangular video. Then, to plausibly reproduce real-world lighting conditions for virtual objects, we use inverse tone mapping to recover high dynamic range environment maps which vary spatially along the camera path. We implement our approach into the Unity rendering engine for real-time virtual object insertion via differential rendering, with dynamic lighting, image-based shadowing, and user interaction. This expands the use and flexibility of 360° video for interactive computer graphics and visual effects applications

    MegaParallax: Casual 360° Panoramas with Motion Parallax

    Get PDF
    The ubiquity of smart mobile devices, such as phones and tablets, enables users to casually capture 360° panoramas with a single camera sweep to share and relive experiences. However, panoramas lack motion parallax as they do not provide different views for different viewpoints. The motion parallax induced by translational head motion is a crucial depth cue in daily life. Alternatives, such as omnidirectional stereo panoramas, provide different views for each eye (binocular disparity), but they also lack motion parallax as the left and right eye panoramas are stitched statically. Methods based on explicit scene geometry reconstruct textured 3D geometry, which provides motion parallax, but suffers from visible reconstruction artefacts. The core of our method is a novel multi-perspective panorama representation, which can be casually captured and rendered with motion parallax for each eye on the fly. This provides a more realistic perception of panoramic environments which is particularly useful for virtual reality applications. Our approach uses a single consumer video camera to acquire 200–400 views of a real 360° environment with a single sweep. By using novel-view synthesis with flow-based blending, we show how to turn these input views into an enriched 360° panoramic experience that can be explored in real time, without relying on potentially unreliable reconstruction of scene geometry. We compare our results with existing omnidirectional stereo and image-based rendering methods to demonstrate the benefit of our approach, which is the first to enable casual consumers to capture and view high-quality 360° panoramas with motion parallax.This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 66599

    Colour videos with depth : acquisition, processing and evaluation

    Get PDF
    The human visual system lets us perceive the world around us in three dimensions by integrating evidence from depth cues into a coherent visual model of the world. The equivalent in computer vision and computer graphics are geometric models, which provide a wealth of information about represented objects, such as depth and surface normals. Videos do not contain this information, but only provide per-pixel colour information. In this dissertation, I hence investigate a combination of videos and geometric models: videos with per-pixel depth (also known as RGBZ videos). I consider the full life cycle of these videos: from their acquisition, via filtering and processing, to stereoscopic display. I propose two approaches to capture videos with depth. The first is a spatiotemporal stereo matching approach based on the dual-cross-bilateral grid – a novel real-time technique derived by accelerating a reformulation of an existing stereo matching approach. This is the basis for an extension which incorporates temporal evidence in real time, resulting in increased temporal coherence of disparity maps – particularly in the presence of image noise. The second acquisition approach is a sensor fusion system which combines data from a noisy, low-resolution time-of-flight camera and a high-resolution colour video camera into a coherent, noise-free video with depth. The system consists of a three-step pipeline that aligns the video streams, efficiently removes and fills invalid and noisy geometry, and finally uses a spatiotemporal filter to increase the spatial resolution of the depth data and strongly reduce depth measurement noise. I show that these videos with depth empower a range of video processing effects that are not achievable using colour video alone. These effects critically rely on the geometric information, like a proposed video relighting technique which requires high-quality surface normals to produce plausible results. In addition, I demonstrate enhanced non-photorealistic rendering techniques and the ability to synthesise stereoscopic videos, which allows these effects to be applied stereoscopically. These stereoscopic renderings inspired me to study stereoscopic viewing discomfort. The result of this is a surprisingly simple computational model that predicts the visual comfort of stereoscopic images. I validated this model using a perceptual study, which showed that it correlates strongly with human comfort ratings. This makes it ideal for automatic comfort assessment, without the need for costly and lengthy perceptual studies

    360MonoDepth: High-Resolution 360° Monocular Depth Estimation

    Get PDF
    360{\deg} cameras can capture complete environments in a single shot, which makes 360{\deg} imagery alluring in many computer vision tasks. However, monocular depth estimation remains a challenge for 360{\deg} data, particularly for high resolutions like 2K (2048x1024) and beyond that are important for novel-view synthesis and virtual reality applications. Current CNN-based methods do not support such high resolutions due to limited GPU memory. In this work, we propose a flexible framework for monocular depth estimation from high-resolution 360{\deg} images using tangent images. We project the 360{\deg} input image onto a set of tangent planes that produce perspective views, which are suitable for the latest, most accurate state-of-the-art perspective monocular depth estimators. To achieve globally consistent disparity estimates, we recombine the individual depth estimates using deformable multi-scale alignment followed by gradient-domain blending. The result is a dense, high-resolution 360{\deg} depth map with a high level of detail, also for outdoor scenes which are not supported by existing methods. Our source code and data are available at https://manurare.github.io/360monodepth/.Comment: CVPR 2022. Project page: https://manurare.github.io/360monodepth

    Real-time Global Illumination Decomposition of Videos

    Get PDF
    We propose the first approach for the decomposition of a monocular color video into direct and indirect illumination components in real time. We retrieve, in separate layers, the contribution made to the scene appearance by the scene reflectance, the light sources and the reflections from various coherent scene regions to one another. Existing techniques that invert global light transport require image capture under multiplexed controlled lighting, or only enable the decomposition of a single image at slow off-line frame rates. In contrast, our approach works for regular videos and produces temporally coherent decomposition layers at real-time frame rates. At the core of our approach are several sparsity priors that enable the estimation of the per-pixel direct and indirect illumination layers based on a small set of jointly estimated base reflectance colors. The resulting variational decomposition problem uses a new formulation based on sparse and dense sets of non-linear equations that we solve efficiently using a novel alternating data-parallel optimization strategy. We evaluate our approach qualitatively and quantitatively, and show improvements over the state of the art in this field, in both quality and runtime. In addition, we demonstrate various real-time appearance editing applications for videos with consistent illumination

    360MonoDepth: High-Resolution 360° Monocular Depth Estimation

    Get PDF

    Real-time Halfway Domain Reconstruction of Motion and Geometry

    Get PDF
    We present a novel approach for real-time joint reconstruction of 3D scene motion and geometry from binocular stereo videos. Our approach is based on a novel variational halfway-domain scene flow formulation, which allows us to obtain highly accurate spatiotemporal reconstructions of shape and motion. We solve the underlying optimization problem at real-time frame rates using a novel data-parallel robust non-linear optimization strategy. Fast convergence and large displacement flows are achieved by employing a novel hierarchy that stores delta flows between hierarchy levels. High performance is obtained by the introduction of a coarser warp grid that decouples the number of unknowns from the input resolution of the images. We demonstrate our approach in a live setup that is based on two commodity webcams, as well as on publicly available video data. Our extensive experiments and evaluations show that our approach produces high-quality dense reconstructions of 3D geometry and scene flow at real-time frame rates, and compares favorably to the state of the art

    ExMaps: Long-Term Localization in Dynamic Scenes using Exponential Decay

    Get PDF
    Visual camera localization using offline maps is widespread in robotics and mobile applications. Most state-of-the-art localization approaches assume static scenes, so maps are often reconstructed once and then kept constant. However, many scenes are dynamic and as changes in the scene happen, future localization attempts may struggle or fail entirely. Therefore, it is important for successful long-term localization to update and maintain maps as new observations of the scene, and changes in it, arrive. We propose a novel method for automatically discovering which points in a map remain stable over time, and which are due to transient changes. To this end, we calculate a stability store for each point based on its visibility over time, weighted by an exponential decay over time. This allows us to consider the impact of time when scoring points, and to distinguish which points are useful for long-term localization. We evaluate our method on the CMU Extended Seasons dataset (outdoors) and a new indoor dataset of a retail shop, and show the benefit of maintaining a ‘live map’ that integrates updates over time using our exponential decay based method over a static ‘base map’
    • …
    corecore