257 research outputs found

    Efficient 3D Reconstruction, Streaming and Visualization of Static and Dynamic Scene Parts for Multi-client Live-telepresence in Large-scale Environments

    Full text link
    Despite the impressive progress of telepresence systems for room-scale scenes with static and dynamic scene entities, expanding their capabilities to scenarios with larger dynamic environments beyond a fixed size of a few square-meters remains challenging. In this paper, we aim at sharing 3D live-telepresence experiences in large-scale environments beyond room scale with both static and dynamic scene entities at practical bandwidth requirements only based on light-weight scene capture with a single moving consumer-grade RGB-D camera. To this end, we present a system which is built upon a novel hybrid volumetric scene representation in terms of the combination of a voxel-based scene representation for the static contents, that not only stores the reconstructed surface geometry but also contains information about the object semantics as well as their accumulated dynamic movement over time, and a point-cloud-based representation for dynamic scene parts, where the respective separation from static parts is achieved based on semantic and instance information extracted for the input frames. With an independent yet simultaneous streaming of both static and dynamic content, where we seamlessly integrate potentially moving but currently static scene entities in the static model until they are becoming dynamic again, as well as the fusion of static and dynamic data at the remote client, our system is able to achieve VR-based live-telepresence at close to real-time rates. Our evaluation demonstrates the potential of our novel approach in terms of visual quality, performance, and ablation studies regarding involved design choices

    Perception-driven approaches to real-time remote immersive visualization

    Get PDF
    In remote immersive visualization systems, real-time 3D perception through RGB-D cameras, combined with modern Virtual Reality (VR) interfaces, enhances the user’s sense of presence in a remote scene through 3D reconstruction rendered in a remote immersive visualization system. Particularly, in situations when there is a need to visualize, explore and perform tasks in inaccessible environments, too hazardous or distant. However, a remote visualization system requires the entire pipeline from 3D data acquisition to VR rendering satisfies the speed, throughput, and high visual realism. Mainly when using point-cloud, there is a fundamental quality difference between the acquired data of the physical world and the displayed data because of network latency and throughput limitations that negatively impact the sense of presence and provoke cybersickness. This thesis presents state-of-the-art research to address these problems by taking the human visual system as inspiration, from sensor data acquisition to VR rendering. The human visual system does not have a uniform vision across the field of view; It has the sharpest visual acuity at the center of the field of view. The acuity falls off towards the periphery. The peripheral vision provides lower resolution to guide the eye movements so that the central vision visits all the interesting crucial parts. As a first contribution, the thesis developed remote visualization strategies that utilize the acuity fall-off to facilitate the processing, transmission, buffering, and rendering in VR of 3D reconstructed scenes while simultaneously reducing throughput requirements and latency. As a second contribution, the thesis looked into attentional mechanisms to select and draw user engagement to specific information from the dynamic spatio-temporal environment. It proposed a strategy to analyze the remote scene concerning the 3D structure of the scene, its layout, and the spatial, functional, and semantic relationships between objects in the scene. The strategy primarily focuses on analyzing the scene with models the human visual perception uses. It sets a more significant proportion of computational resources on objects of interest and creates a more realistic visualization. As a supplementary contribution, A new volumetric point-cloud density-based Peak Signal-to-Noise Ratio (PSNR) metric is proposed to evaluate the introduced techniques. An in-depth evaluation of the presented systems, comparative examination of the proposed point cloud metric, user studies, and experiments demonstrated that the methods introduced in this thesis are visually superior while significantly reducing latency and throughput

    Enhanced life-size holographic telepresence framework with real-time three-dimensional reconstruction for dynamic scene

    Get PDF
    Three-dimensional (3D) reconstruction has the ability to capture and reproduce 3D representation of a real object or scene. 3D telepresence allows the user to feel the presence of remote user that was remotely transferred in a digital representation. Holographic display is one of alternatives to discard wearable hardware restriction, it utilizes light diffraction to display 3D images to the viewers. However, to capture a real-time life-size or a full-body human is still challenging since it involves a dynamic scene. The remaining issue arises when dynamic object to be reconstructed is always moving and changes shapes and required multiple capturing views. The life-size data captured were multiplied exponentially when working with more depth cameras, it can cause the high computation time especially involving dynamic scene. To transfer high volume 3D images over network in real-time can also cause lag and latency issue. Hence, the aim of this research is to enhance life-size holographic telepresence framework with real-time 3D reconstruction for dynamic scene. There are three stages have been carried out, in the first stage the real-time 3D reconstruction with the Marching Square algorithm is combined during data acquisition of dynamic scenes captured by life-size setup of multiple Red Green Blue-Depth (RGB-D) cameras. Second stage is to transmit the data that was acquired from multiple RGB-D cameras in real-time and perform double compression for the life-size holographic telepresence. The third stage is to evaluate the life-size holographic telepresence framework that has been integrated with the real-time 3D reconstruction of dynamic scenes. The findings show that by enhancing life-size holographic telepresence framework with real-time 3D reconstruction, it has reduced the computation time and improved the 3D representation of remote user in dynamic scene. By running the double compression for the life-size holographic telepresence, 3D representations in life-size is smooth. It has proven can minimize the delay or latency during acquired frames synchronization in remote communications

    Augmented Reality and Its Application

    Get PDF
    Augmented Reality (AR) is a discipline that includes the interactive experience of a real-world environment, in which real-world objects and elements are enhanced using computer perceptual information. It has many potential applications in education, medicine, and engineering, among other fields. This book explores these potential uses, presenting case studies and investigations of AR for vocational training, emergency response, interior design, architecture, and much more

    Video based reconstruction system for mixed reality environments supporting contextualised non-verbal communication and its study

    Get PDF
    This Thesis presents a system to capture, reconstruct and render the three-dimensional form of people and objects of interest in such detail that the spatial and visual aspects of non-verbal behaviour can be communicated.The system supports live distribution and simultaneous rendering in multiple locations enabling the apparent teleportation of people and objects. Additionally, the system allows for the recording of live sessions and their playback in natural time with free-viewpoint.It utilises components of a video based reconstruction and a distributed video implementation to create an end-to-end system that can operate in real-time and on commodity hardware.The research addresses the specific challenges of spatial and colour calibration, segmentation and overall system architecture to overcome technical barriers, the requirement of domain specific knowledge to setup and generate avatars to a consistent high quality.Applications of the system include, but are not limited to, telepresence, where the computer generated avatars used in Immersive Collaborative Virtual Environments can be replaced with ones that are faithful of the people they represent and supporting researchers in their study of human communication such as gaze, inter-personal distance and facial expression.The system has been adopted in other research projects and is integrated with a mixed reality application where, during a live linkup, a three-dimensional avatar is streamed to multiple end-points across different countries

    From Capture to Display: A Survey on Volumetric Video

    Full text link
    Volumetric video, which offers immersive viewing experiences, is gaining increasing prominence. With its six degrees of freedom, it provides viewers with greater immersion and interactivity compared to traditional videos. Despite their potential, volumetric video services poses significant challenges. This survey conducts a comprehensive review of the existing literature on volumetric video. We firstly provide a general framework of volumetric video services, followed by a discussion on prerequisites for volumetric video, encompassing representations, open datasets, and quality assessment metrics. Then we delve into the current methodologies for each stage of the volumetric video service pipeline, detailing capturing, compression, transmission, rendering, and display techniques. Lastly, we explore various applications enabled by this pioneering technology and we present an array of research challenges and opportunities in the domain of volumetric video services. This survey aspires to provide a holistic understanding of this burgeoning field and shed light on potential future research trajectories, aiming to bring the vision of volumetric video to fruition.Comment: Submitte

    REAL-TIME CAPTURE AND RENDERING OF PHYSICAL SCENE WITH AN EFFICIENTLY CALIBRATED RGB-D CAMERA NETWORK

    Get PDF
    From object tracking to 3D reconstruction, RGB-Depth (RGB-D) camera networks play an increasingly important role in many vision and graphics applications. With the recent explosive growth of Augmented Reality (AR) and Virtual Reality (VR) platforms, utilizing camera RGB-D camera networks to capture and render dynamic physical space can enhance immersive experiences for users. To maximize coverage and minimize costs, practical applications often use a small number of RGB-D cameras and sparsely place them around the environment for data capturing. While sparse color camera networks have been studied for decades, the problems of extrinsic calibration of and rendering with sparse RGB-D camera networks are less well understood. Extrinsic calibration is difficult because of inappropriate RGB-D camera models and lack of shared scene features. Due to the significant camera noise and sparse coverage of the scene, the quality of rendering 3D point clouds is much lower compared with synthetic models. Adding virtual objects whose rendering depend on the physical environment such as those with reflective surfaces further complicate the rendering pipeline. In this dissertation, I propose novel solutions to tackle these challenges faced by RGB-D camera systems. First, I propose a novel extrinsic calibration algorithm that can accurately and rapidly calibrate the geometric relationships across an arbitrary number of RGB-D cameras on a network. Second, I propose a novel rendering pipeline that can capture and render, in real-time, dynamic scenes in the presence of arbitrary-shaped reflective virtual objects. Third, I have demonstrated a teleportation application that uses the proposed system to merge two geographically separated 3D captured scenes into the same reconstructed environment. To provide a fast and robust calibration for a sparse RGB-D camera network, first, the correspondences between different camera views are established by using a spherical calibration object. We show that this approach outperforms other techniques based on planar calibration objects. Second, instead of modeling camera extrinsic using rigid transformation that is optimal only for pinhole cameras, different view transformation functions including rigid transformation, polynomial transformation, and manifold regression are systematically tested to determine the most robust mapping that generalizes well to unseen data. Third, the celebrated bundle adjustment procedure is reformulated to minimize the global 3D projection error so as to fine-tune the initial estimates. To achieve a realistic mirror rendering, a robust eye detector is used to identify the viewer\u27s 3D location and render the reflective scene accordingly. The limited field of view obtained from a single camera is overcome by our calibrated RGB-D camera network system that is scalable to capture an arbitrarily large environment. The rendering is accomplished by raytracing light rays from the viewpoint to the scene reflected by the virtual curved surface. To the best of our knowledge, the proposed system is the first to render reflective dynamic scenes from real 3D data in large environments. Our scalable client-server architecture is computationally efficient - the calibration of a camera network system, including data capture, can be done in minutes using only commodity PCs

    Archiving and Delivery of 3DTI Rehabilitation Sessions

    Get PDF
    In this paper we present CyPhy: a cyber-physiotherapy system that brings daily rehabilitation to patient’s home with supervision from trained therapist. With its archiving and delivery features, CyPhy is able to 1) capture and record RGB-D and physiotherapy-related medical sensing data streams in home environment; 2) provide efficient storage for rehabilitation session recordings; 3) provide fast metadata analysis over stored sessions for review recommendation; 4) adaptively deliver rehabilitation session under different networking capabilities; 5) support smooth viewpoint changing during 3D video streaming with scene rendering schemes tailored for devices with different bandwidth and power limitations; and 6) provide platform-independent streaming client for various mobile and PC environments
    corecore