20,232 research outputs found

    Loss-resilient Coding of Texture and Depth for Free-viewpoint Video Conferencing

    Full text link
    Free-viewpoint video conferencing allows a participant to observe the remote 3D scene from any freely chosen viewpoint. An intermediate virtual viewpoint image is commonly synthesized using two pairs of transmitted texture and depth maps from two neighboring captured viewpoints via depth-image-based rendering (DIBR). To maintain high quality of synthesized images, it is imperative to contain the adverse effects of network packet losses that may arise during texture and depth video transmission. Towards this end, we develop an integrated approach that exploits the representation redundancy inherent in the multiple streamed videos a voxel in the 3D scene visible to two captured views is sampled and coded twice in the two views. In particular, at the receiver we first develop an error concealment strategy that adaptively blends corresponding pixels in the two captured views during DIBR, so that pixels from the more reliable transmitted view are weighted more heavily. We then couple it with a sender-side optimization of reference picture selection (RPS) during real-time video coding, so that blocks containing samples of voxels that are visible in both views are more error-resiliently coded in one view only, given adaptive blending will erase errors in the other view. Further, synthesized view distortion sensitivities to texture versus depth errors are analyzed, so that relative importance of texture and depth code blocks can be computed for system-wide RPS optimization. Experimental results show that the proposed scheme can outperform the use of a traditional feedback channel by up to 0.82 dB on average at 8% packet loss rate, and by as much as 3 dB for particular frames

    Beyond standard benchmarks: Parameterizing performance evaluation in visual object tracking

    Get PDF
    Object-to-camera motion produces a variety of apparent motion patterns that significantly affect performance of short-term visual trackers. Despite being crucial for designing robust trackers, their influence is poorly explored in standard benchmarks due to weakly defined, biased and overlapping attribute annotations. In this paper we propose to go beyond pre-recorded benchmarks with post-hoc annotations by presenting an approach that utilizes omnidirectional videos to generate realistic, consistently annotated, short-term tracking scenarios with exactly parameterized motion patterns. We have created an evaluation system, constructed a fully annotated dataset of omnidirectional videos and the generators for typical motion patterns. We provide an in-depth analysis of major tracking paradigms which is complementary to the standard benchmarks and confirms the expressiveness of our evaluation approach

    A framework for realistic 3D tele-immersion

    Get PDF
    Meeting, socializing and conversing online with a group of people using teleconferencing systems is still quite differ- ent from the experience of meeting face to face. We are abruptly aware that we are online and that the people we are engaging with are not in close proximity. Analogous to how talking on the telephone does not replicate the experi- ence of talking in person. Several causes for these differences have been identified and we propose inspiring and innova- tive solutions to these hurdles in attempt to provide a more realistic, believable and engaging online conversational expe- rience. We present the distributed and scalable framework REVERIE that provides a balanced mix of these solutions. Applications build on top of the REVERIE framework will be able to provide interactive, immersive, photo-realistic ex- periences to a multitude of users that for them will feel much more similar to having face to face meetings than the expe- rience offered by conventional teleconferencing systems

    Selecting surface features for accurate multi-camera surface reconstruction

    Get PDF
    This paper proposes a novel feature detector for selecting local textures that are suitable for accurate multi-camera surface reconstruction, and in particular planar patch fitting techniques. This approach is in contrast to conventional feature detectors, which focus on repeatability under scale and affine transformations rather than suitability for multi-camera reconstruction techniques. The proposed detector selects local textures that are sensitive to affine transformations, which is a fundamental requirement for accurate patch fitting. The proposed detector is evaluated against the SIFT detector on a synthetic dataset and the fitted patches are compared against ground truth. The experiments show that patches originating from the proposed detector are fitted more accurately to the visible surfaces than those originating from SIFT keypoints. In addition, the detector is evaluated on a performance capture studio dataset to show the real-world application of the proposed detector

    Selecting surface features for accurate multi-camera surface reconstruction

    Get PDF
    This paper proposes a novel feature detector for selecting local textures that are suitable for accurate multi-camera surface reconstruction, and in particular planar patch fitting techniques. This approach is in contrast to conventional feature detectors, which focus on repeatability under scale and affine transformations rather than suitability for multi-camera reconstruction techniques. The proposed detector selects local textures that are sensitive to affine transformations, which is a fundamental requirement for accurate patch fitting. The proposed detector is evaluated against the SIFT detector on a synthetic dataset and the fitted patches are compared against ground truth. The experiments show that patches originating from the proposed detector are fitted more accurately to the visible surfaces than those originating from SIFT keypoints. In addition, the detector is evaluated on a performance capture studio dataset to show the real-world application of the proposed detector

    Learning 3D Navigation Protocols on Touch Interfaces with Cooperative Multi-Agent Reinforcement Learning

    Get PDF
    Using touch devices to navigate in virtual 3D environments such as computer assisted design (CAD) models or geographical information systems (GIS) is inherently difficult for humans, as the 3D operations have to be performed by the user on a 2D touch surface. This ill-posed problem is classically solved with a fixed and handcrafted interaction protocol, which must be learned by the user. We propose to automatically learn a new interaction protocol allowing to map a 2D user input to 3D actions in virtual environments using reinforcement learning (RL). A fundamental problem of RL methods is the vast amount of interactions often required, which are difficult to come by when humans are involved. To overcome this limitation, we make use of two collaborative agents. The first agent models the human by learning to perform the 2D finger trajectories. The second agent acts as the interaction protocol, interpreting and translating to 3D operations the 2D finger trajectories from the first agent. We restrict the learned 2D trajectories to be similar to a training set of collected human gestures by first performing state representation learning, prior to reinforcement learning. This state representation learning is addressed by projecting the gestures into a latent space learned by a variational auto encoder (VAE).Comment: 17 pages, 8 figures. Accepted at The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases 2019 (ECMLPKDD 2019
    corecore