28 research outputs found
Colour videos with depth : acquisition, processing and evaluation
The human visual system lets us perceive the world around us in three dimensions
by integrating evidence from depth cues into a coherent visual model of the world. The equivalent in computer vision and computer graphics are geometric models,
which provide a wealth of information about represented objects, such as depth and
surface normals. Videos do not contain this information, but only provide per-pixel
colour information. In this dissertation, I hence investigate a combination of videos
and geometric models: videos with per-pixel depth (also known as
RGBZ videos).
I consider the full life cycle of these videos: from their acquisition, via filtering and
processing, to stereoscopic display.
I propose two approaches to capture videos with depth. The first is a spatiotemporal
stereo matching approach based on the dual-cross-bilateral grid – a novel real-time
technique derived by accelerating a reformulation of an existing stereo matching
approach. This is the basis for an extension which incorporates temporal evidence in
real time, resulting in increased temporal coherence of disparity maps – particularly
in the presence of image noise.
The second acquisition approach is a sensor fusion system which combines data
from a noisy, low-resolution time-of-flight camera and a high-resolution colour
video camera into a coherent, noise-free video with depth. The system consists
of a three-step pipeline that aligns the video streams, efficiently removes and fills
invalid and noisy geometry, and finally uses a spatiotemporal filter to increase the
spatial resolution of the depth data and strongly reduce depth measurement noise.
I show that these videos with depth empower a range of video processing effects
that are not achievable using colour video alone. These effects critically rely on the
geometric information, like a proposed video relighting technique which requires
high-quality surface normals to produce plausible results. In addition, I demonstrate
enhanced non-photorealistic rendering techniques and the ability to synthesise
stereoscopic videos, which allows these effects to be applied stereoscopically.
These stereoscopic renderings inspired me to study stereoscopic viewing discomfort.
The result of this is a surprisingly simple computational model that predicts the
visual comfort of stereoscopic images. I validated this model using a perceptual
study, which showed that it correlates strongly with human comfort ratings. This
makes it ideal for automatic comfort assessment, without the need for costly and
lengthy perceptual studies
Robust temporal depth enhancement method for dynamic virtual view synthesis
Depth-image-based rendering (DIBR) is a view synthesis technique that generates virtual views by warping from
the reference images based on depth maps. The quality of synthesized views highly depends on the accuracy of
depth maps. However, for dynamic scenarios, depth sequences obtained through stereo matching methods frame
by frame can be temporally inconsistent, especially in static regions, which leads to uncomfortable flickering
artifacts in synthesized videos. This problem can be eliminated by depth enhancement methods that perform
temporal filtering to suppress depth inconsistency, yet those methods may also spread depth errors. Although these
depth enhancement algorithms increase the temporal consistency of synthesized videos, they have the risk of
reducing the quality of rendered videos. Since conventional methods may not achieve both properties, in this paper,
we present for static regions a robust temporal depth enhancement (RTDE) method, which propagates exactly the
reliable depth values into succeeding frames to upgrade not only the accuracy but also the temporal consistency
of depth estimations. This technique benefits the quality of synthesized videos. In addition we propose a novel
evaluation metric to quantitatively compare temporal consistency between our method and the state of arts.
Experimental results demonstrate the robustness of our method for dynamic virtual view synthesis, not only the
temporal consistency but also the quality of synthesized videos in static regions are improved
A Brief Survey of Image-Based Depth Upsampling
Recently, there has been remarkable growth of interest in the development and applications of Time-of-Flight (ToF) depth cameras. However, despite the permanent improvement of their characteristics, the practical applicability of ToF cameras is still limited by low resolution and quality of depth measurements. This has motivated many researchers to combine ToF cameras with other sensors in order to enhance and upsample depth images. In this paper, we compare ToF cameras to three image-based techniques for depth recovery, discuss the upsampling problem and survey the approaches that couple ToF depth images with high-resolution optical images. Other classes of upsampling methods are also mentioned
Joint view expansion and filtering for automultiscopic 3D displays
Multi-view autostereoscopic displays provide an immersive, glasses-free 3D viewing experience, but they require correctly filtered content from multiple viewpoints. This, however, cannot be easily obtained with current stereoscopic production pipelines. We provide a practical solution that takes a stereoscopic video as an input and converts it to multi-view and filtered video streams that can be used to drive multi-view autostereoscopic displays. The method combines a phase-based video magnification and an interperspective antialiasing into a single filtering process. The whole algorithm is simple and can be efficiently implemented on current GPUs to yield a near real-time performance. Furthermore, the ability to retarget disparity is naturally supported. Our method is robust and works well for challenging video scenes with defocus blur, motion blur, transparent materials, and specularities. We show that our results are superior when compared to the state-of-the-art depth-based rendering methods. Finally, we showcase the method in the context of a real-time 3D videoconferencing system that requires only two cameras.Quanta Computer (Firm)National Science Foundation (U.S.) (NSF IIS-1111415)National Science Foundation (U.S.) (NSF IIS-1116296