359,349 research outputs found

    An investigation of visual cues used to create and support frames of reference and visual search tasks in desktop virtual environments

    Get PDF
    Visual depth cues are combined to produce the essential depth and dimensionality of Desktop Virtual Environments (DVEs). This study discusses DVEs in terms of the visual depth cues that create and support perception of frames of references and accomplishment of visual search tasks. This paper presents the results of an investigation that identifies the effects of the experimental stimuli positions and visual depth cues: luminance, texture, relative height and motion parallax on precise depth judgements made within a DVE. Results indicate that the experimental stimuli positions significantly affect precise depth judgements, texture is only significantly effective for certain conditions, and motion parallax, in line with previous results, is inconclusive to determine depth judgement accuracy for egocentrically viewed DVEs. Results also show that exocentric views, incorporating relative height and motion parallax visual cues, are effective for precise depth judgements made in DVEs. The results help us to understand the effects of certain visual depth cues to support the perception of frames of references and precise depth judgements, suggesting that the visual depth cues employed to create frames of references in DVEs may influence how effectively precise depth judgements are undertaken

    Loss-resilient Coding of Texture and Depth for Free-viewpoint Video Conferencing

    Full text link
    Free-viewpoint video conferencing allows a participant to observe the remote 3D scene from any freely chosen viewpoint. An intermediate virtual viewpoint image is commonly synthesized using two pairs of transmitted texture and depth maps from two neighboring captured viewpoints via depth-image-based rendering (DIBR). To maintain high quality of synthesized images, it is imperative to contain the adverse effects of network packet losses that may arise during texture and depth video transmission. Towards this end, we develop an integrated approach that exploits the representation redundancy inherent in the multiple streamed videos a voxel in the 3D scene visible to two captured views is sampled and coded twice in the two views. In particular, at the receiver we first develop an error concealment strategy that adaptively blends corresponding pixels in the two captured views during DIBR, so that pixels from the more reliable transmitted view are weighted more heavily. We then couple it with a sender-side optimization of reference picture selection (RPS) during real-time video coding, so that blocks containing samples of voxels that are visible in both views are more error-resiliently coded in one view only, given adaptive blending will erase errors in the other view. Further, synthesized view distortion sensitivities to texture versus depth errors are analyzed, so that relative importance of texture and depth code blocks can be computed for system-wide RPS optimization. Experimental results show that the proposed scheme can outperform the use of a traditional feedback channel by up to 0.82 dB on average at 8% packet loss rate, and by as much as 3 dB for particular frames

    Rate-Distortion Analysis of Multiview Coding in a DIBR Framework

    Get PDF
    Depth image based rendering techniques for multiview applications have been recently introduced for efficient view generation at arbitrary camera positions. Encoding rate control has thus to consider both texture and depth data. Due to different structures of depth and texture images and their different roles on the rendered views, distributing the available bit budget between them however requires a careful analysis. Information loss due to texture coding affects the value of pixels in synthesized views while errors in depth information lead to shift in objects or unexpected patterns at their boundaries. In this paper, we address the problem of efficient bit allocation between textures and depth data of multiview video sequences. We adopt a rate-distortion framework based on a simplified model of depth and texture images. Our model preserves the main features of depth and texture images. Unlike most recent solutions, our method permits to avoid rendering at encoding time for distortion estimation so that the encoding complexity is not augmented. In addition to this, our model is independent of the underlying inpainting method that is used at decoder. Experiments confirm our theoretical results and the efficiency of our rate allocation strategy

    Face Spoofing Detection by Fusing Binocular Depth and Spatial Pyramid Coding Micro-Texture Features

    Full text link
    Robust features are of vital importance to face spoofing detection, because various situations make feature space extremely complicated to partition. Thus in this paper, two novel and robust features for anti-spoofing are proposed. The first one is a binocular camera based depth feature called Template Face Matched Binocular Depth (TFBD) feature. The second one is a high-level micro-texture based feature called Spatial Pyramid Coding Micro-Texture (SPMT) feature. Novel template face registration algorithm and spatial pyramid coding algorithm are also introduced along with the two novel features. Multi-modal face spoofing detection is implemented based on these two robust features. Experiments are conducted on a widely used dataset and a comprehensive dataset constructed by ourselves. The results reveal that face spoofing detection with the fusion of our proposed features is of strong robustness and time efficiency, meanwhile outperforming other state-of-the-art traditional methods.Comment: 5 pages, 2 figures, accepted by 2017 IEEE International Conference on Image Processing (ICIP

    Visual Object Networks: Image Generation with Disentangled 3D Representation

    Full text link
    Recent progress in deep generative models has led to tremendous breakthroughs in image generation. However, while existing models can synthesize photorealistic images, they lack an understanding of our underlying 3D world. We present a new generative model, Visual Object Networks (VON), synthesizing natural images of objects with a disentangled 3D representation. Inspired by classic graphics rendering pipelines, we unravel our image formation process into three conditionally independent factors---shape, viewpoint, and texture---and present an end-to-end adversarial learning framework that jointly models 3D shapes and 2D images. Our model first learns to synthesize 3D shapes that are indistinguishable from real shapes. It then renders the object's 2.5D sketches (i.e., silhouette and depth map) from its shape under a sampled viewpoint. Finally, it learns to add realistic texture to these 2.5D sketches to generate natural images. The VON not only generates images that are more realistic than state-of-the-art 2D image synthesis methods, but also enables many 3D operations such as changing the viewpoint of a generated image, editing of shape and texture, linear interpolation in texture and shape space, and transferring appearance across different objects and viewpoints.Comment: NeurIPS 2018. Code: https://github.com/junyanz/VON Website: http://von.csail.mit.edu

    The effect of pictorial depth information on projected size judgements

    Get PDF
    When full depth cues are available, size judgements are dominated by physical size. However, with reduced depth cues, size judgements are less influenced by physical size and more influenced by projected size. This study reduces depth cues further than previous size judgement studies, by manipulating monocularly presented pictorial depth cues only. Participants were monocularly presented with two shapes against a background of zero (control), one, two or three pictorial depth cues. Each cue was added progressively in the following order: height in the visual field, linear perspective, and texture gradient. Participants made a „same-different? judgement regarding the projected size of the two shapes, i.e. ignoring any depth cues. As expected, accuracy increased and response times decreased as the ratio between the projected size of the two shapes increased (range of projected size ratios, 1:1 to 1:5). In addition, with the exception of the larger size ratios (1:4 and 1:5), detection of projected size difference was poorer as depth cues were added. One-cue and two-cue conditions had the most weighting in this performance decrement, with little weighting from the three-cue condition. We conclude that even minimal depth information is difficult to inhibit. This indicates that depth perception requires little focussed attention
    corecore