99 research outputs found

    Temporally Coherent Video De-Anaglyph

    Get PDF
    Talk and Poster at SIGGRAPH 2014For a long time, stereoscopic 3D videos were usually encoded and shown in the anaglyph format. This format combines the two stereo views into a single color image by splitting its color spectrum and assigning each view to one half of the spectrum, for example red for the left and cyan (blue+green) for the right view. Glasses with matching color filters then separate the color channels again to provide the appropriate view to each eye. This simplicity made anaglyph stereo a popular choice for showing stereoscopic content, as it works with existing screens, projectors and print media. However, modern stereo displays and projectors natively support two full-color views, and avoid the viewing discomfort associated with anaglyph videos. Our work investigates how to convert existing anaglyph videos to the full-color stereo format used by modern displays. Anaglyph videos only contain half the color information compared to the full-color videos, and the missing color channels need to be reconstructed from the existing ones in a plausible and temporally coherent fashion. Joulin and Kang [2013] propose an approach that works well for images, but their extension to video is limited by the heavy computational complexity of their approach. Other techniques only support single images and when applied to each frame of a video generally produce flickering results. In our approach, we put the temporal coherence of the stereo results front and center by expressing Joulin and Kang's approach within the practical temporal consistency framework of Lang et al. [2012]. As a result, our approach is both efficient and temporally coherent. In addition, it computes temporally coherent optical flow and disparity maps that can be used for various post-processing tasks

    Colour videos with depth : acquisition, processing and evaluation

    Get PDF
    The human visual system lets us perceive the world around us in three dimensions by integrating evidence from depth cues into a coherent visual model of the world. The equivalent in computer vision and computer graphics are geometric models, which provide a wealth of information about represented objects, such as depth and surface normals. Videos do not contain this information, but only provide per-pixel colour information. In this dissertation, I hence investigate a combination of videos and geometric models: videos with per-pixel depth (also known as RGBZ videos). I consider the full life cycle of these videos: from their acquisition, via filtering and processing, to stereoscopic display. I propose two approaches to capture videos with depth. The first is a spatiotemporal stereo matching approach based on the dual-cross-bilateral grid – a novel real-time technique derived by accelerating a reformulation of an existing stereo matching approach. This is the basis for an extension which incorporates temporal evidence in real time, resulting in increased temporal coherence of disparity maps – particularly in the presence of image noise. The second acquisition approach is a sensor fusion system which combines data from a noisy, low-resolution time-of-flight camera and a high-resolution colour video camera into a coherent, noise-free video with depth. The system consists of a three-step pipeline that aligns the video streams, efficiently removes and fills invalid and noisy geometry, and finally uses a spatiotemporal filter to increase the spatial resolution of the depth data and strongly reduce depth measurement noise. I show that these videos with depth empower a range of video processing effects that are not achievable using colour video alone. These effects critically rely on the geometric information, like a proposed video relighting technique which requires high-quality surface normals to produce plausible results. In addition, I demonstrate enhanced non-photorealistic rendering techniques and the ability to synthesise stereoscopic videos, which allows these effects to be applied stereoscopically. These stereoscopic renderings inspired me to study stereoscopic viewing discomfort. The result of this is a surprisingly simple computational model that predicts the visual comfort of stereoscopic images. I validated this model using a perceptual study, which showed that it correlates strongly with human comfort ratings. This makes it ideal for automatic comfort assessment, without the need for costly and lengthy perceptual studies

    Stereoscopic Seam Carving With Temporal Consistency

    Full text link
    In this paper, we present a novel technique for seam carving of stereoscopic video. It removes seams of pixels in areas that are most likely not noticed by the viewer. When applying seam carving to stereoscopic video rather than monoscopic still images, new challenges arise. The detected seams must be consistent between the left and the right view, so that no depth information is destroyed. When removing seams in two consecutive frames, temporal consistency between the removed seams must be established to avoid flicker in the resulting video. By making certain assumptions, the available depth information can be harnessed to improve the quality achieved by seam carving. Assuming that closer pixels are more important, the algorithm can focus on removing distant pixels first. Furthermore, we assume that coherent pixels belonging to the same object have similar depth. By avoiding to cut through edges in the depth map, we can thus avoid cutting through object boundaries

    FaceVR: Real-Time Facial Reenactment and Eye Gaze Control in Virtual Reality

    No full text
    We introduce FaceVR, a novel method for gaze-aware facial reenactment in the Virtual Reality (VR) context. The key component of FaceVR is a robust algorithm to perform real-time facial motion capture of an actor who is wearing a head-mounted display (HMD), as well as a new data-driven approach for eye tracking from monocular videos. In addition to these face reconstruction components, FaceVR incorporates photo-realistic re-rendering in real time, thus allowing artificial modifications of face and eye appearances. For instance, we can alter facial expressions, change gaze directions, or remove the VR goggles in realistic re-renderings. In a live setup with a source and a target actor, we apply these newly-introduced algorithmic components. We assume that the source actor is wearing a VR device, and we capture his facial expressions and eye movement in real-time. For the target video, we mimic a similar tracking process; however, we use the source input to drive the animations of the target video, thus enabling gaze-aware facial reenactment. To render the modified target video on a stereo display, we augment our capture and reconstruction process with stereo data. In the end, FaceVR produces compelling results for a variety of applications, such as gaze-aware facial reenactment, reenactment in virtual reality, removal of VR goggles, and re-targeting of somebody's gaze direction in a video conferencing call

    Metrics for Stereoscopic Image Compression

    Get PDF
    Metrics for automatically predicting the compression settings for stereoscopic images, to minimize file size, while still maintaining an acceptable level of image quality are investigated. This research evaluates whether symmetric or asymmetric compression produces a better quality of stereoscopic image. Initially, how Peak Signal to Noise Ratio (PSNR) measures the quality of varyingly compressed stereoscopic image pairs was investigated. Two trials with human subjects, following the ITU-R BT.500-11 Double Stimulus Continuous Quality Scale (DSCQS) were undertaken to measure the quality of symmetric and asymmetric stereoscopic image compression. Computational models of the Human Visual System (HVS) were then investigated and a new stereoscopic image quality metric designed and implemented. The metric point matches regions of high spatial frequency between the left and right views of the stereo pair and accounts for HVS sensitivity to contrast and luminance changes in these regions. The PSNR results show that symmetric, as opposed to asymmetric stereo image compression, produces significantly better results. The human factors trial suggested that in general, symmetric compression of stereoscopic images should be used. The new metric, Stereo Band Limited Contrast, has been demonstrated as a better predictor of human image quality preference than PSNR and can be used to predict a perceptual threshold level for stereoscopic image compression. The threshold is the maximum compression that can be applied without the perceived image quality being altered. Overall, it is concluded that, symmetric, as opposed to asymmetric stereo image encoding, should be used for stereoscopic image compression. As PSNR measures of image quality are correctly criticized for correlating poorly with perceived visual quality, the new HVS based metric was developed. This metric produces a useful threshold to provide a practical starting point to decide the level of compression to use

    Improving visual quality of view transitions in automultiscopic displays

    Get PDF
    Automultiscopic screens present different images depending on the viewing direction. This enables glasses-free 3D and provides motion parallax effect. However, due to the limited angular resolution of such displays, they suffer from hot-spotting, i. e., image quality is highly affected by the viewing position. In this paper, we analyze light fields produced by lenticular and parallax-barrier displays, and show that, unlike in real world, the light fields produced by such screens have a repetitive structure. This induces visual artifacts in the form of view discontinuities, depth reversals, and excessive disparities when viewing position is not optimal. Although the problem has been always considered as inherent to the technology, we demonstrate that light fields reproduced on automultiscopic displays have enough degrees of freedom to improve the visual quality. We propose a new technique that modifies light fields using global and local shears followed by stitching to improve their continuity when displayed on a screen. We show that this enhances visual quality significantly, which is demonstrated in a series of user experiments with an automultiscopic display as well as lenticular prints.National Science Foundation (U.S.) (IIS-1111415)National Science Foundation (U.S.) (IIS-1116296)Quanta Computer (Firm)National Basic Research Program of China (973 Program) (Project 2011CB302205)National Natural Science Foundation (China) (Project 61272226/61120106007)National High-Tech R&D (863) Plan of China (Project 2013AA013903)Beijing Higher Institution Engineering Research Center (Research Grant

    Towards Better Methods of Stereoscopic 3D Media Adjustment and Stylization

    Get PDF
    Stereoscopic 3D (S3D) media is pervasive in film, photography and art. However, working with S3D media poses a number of interesting challenges arising from capture and editing. In this thesis we address several of these challenges. In particular, we address disparity adjustment and present a layer-based method that can reduce disparity without distorting the scene. Our method was successfully used to repair several images for the 2014 documentary “Soldiers’ Stories” directed by Jonathan Kitzen. We then explore consistent and comfortable methods for stylizing stereo images. Our approach uses a modified version of the layer-based technique used for disparity adjustment and can be used with a variety of stylization filters, including those in Adobe Photoshop. We also present a disparity-aware painterly rendering algorithm. A user study concluded that our layer-based stylization method produced S3D images that were more comfortable than previous methods. Finally, we address S3D line drawing from S3D photographs. Line drawing is a common art style that our layer-based method is not able to reproduce. To improve the depth perception of our line drawings we optionally add stylized shading. An expert survey concluded that our results were comfortable and reproduced a sense of depth
    corecore