241 research outputs found

    A mixed reality telepresence system for collaborative space operation

    Get PDF
    This paper presents a Mixed Reality system that results from the integration of a telepresence system and an application to improve collaborative space exploration. The system combines free viewpoint video with immersive projection technology to support non-verbal communication, including eye gaze, inter-personal distance and facial expression. Importantly, these can be interpreted together as people move around the simulation, maintaining natural social distance. The application is a simulation of Mars, within which the collaborators must come to agreement over, for example, where the Rover should land and go. The first contribution is the creation of a Mixed Reality system supporting contextualization of non-verbal communication. Tw technological contributions are prototyping a technique to subtract a person from a background that may contain physical objects and/or moving images, and a light weight texturing method for multi-view rendering which provides balance in terms of visual and temporal quality. A practical contribution is the demonstration of pragmatic approaches to sharing space between display systems of distinct levels of immersion. A research tool contribution is a system that allows comparison of conventional authored and video based reconstructed avatars, within an environment that encourages exploration and social interaction. Aspects of system quality, including the communication of facial expression and end-to-end latency are reported

    Video based reconstruction system for mixed reality environments supporting contextualised non-verbal communication and its study

    Get PDF
    This Thesis presents a system to capture, reconstruct and render the three-dimensional form of people and objects of interest in such detail that the spatial and visual aspects of non-verbal behaviour can be communicated.The system supports live distribution and simultaneous rendering in multiple locations enabling the apparent teleportation of people and objects. Additionally, the system allows for the recording of live sessions and their playback in natural time with free-viewpoint.It utilises components of a video based reconstruction and a distributed video implementation to create an end-to-end system that can operate in real-time and on commodity hardware.The research addresses the specific challenges of spatial and colour calibration, segmentation and overall system architecture to overcome technical barriers, the requirement of domain specific knowledge to setup and generate avatars to a consistent high quality.Applications of the system include, but are not limited to, telepresence, where the computer generated avatars used in Immersive Collaborative Virtual Environments can be replaced with ones that are faithful of the people they represent and supporting researchers in their study of human communication such as gaze, inter-personal distance and facial expression.The system has been adopted in other research projects and is integrated with a mixed reality application where, during a live linkup, a three-dimensional avatar is streamed to multiple end-points across different countries

    Enhanced life-size holographic telepresence framework with real-time three-dimensional reconstruction for dynamic scene

    Get PDF
    Three-dimensional (3D) reconstruction has the ability to capture and reproduce 3D representation of a real object or scene. 3D telepresence allows the user to feel the presence of remote user that was remotely transferred in a digital representation. Holographic display is one of alternatives to discard wearable hardware restriction, it utilizes light diffraction to display 3D images to the viewers. However, to capture a real-time life-size or a full-body human is still challenging since it involves a dynamic scene. The remaining issue arises when dynamic object to be reconstructed is always moving and changes shapes and required multiple capturing views. The life-size data captured were multiplied exponentially when working with more depth cameras, it can cause the high computation time especially involving dynamic scene. To transfer high volume 3D images over network in real-time can also cause lag and latency issue. Hence, the aim of this research is to enhance life-size holographic telepresence framework with real-time 3D reconstruction for dynamic scene. There are three stages have been carried out, in the first stage the real-time 3D reconstruction with the Marching Square algorithm is combined during data acquisition of dynamic scenes captured by life-size setup of multiple Red Green Blue-Depth (RGB-D) cameras. Second stage is to transmit the data that was acquired from multiple RGB-D cameras in real-time and perform double compression for the life-size holographic telepresence. The third stage is to evaluate the life-size holographic telepresence framework that has been integrated with the real-time 3D reconstruction of dynamic scenes. The findings show that by enhancing life-size holographic telepresence framework with real-time 3D reconstruction, it has reduced the computation time and improved the 3D representation of remote user in dynamic scene. By running the double compression for the life-size holographic telepresence, 3D representations in life-size is smooth. It has proven can minimize the delay or latency during acquired frames synchronization in remote communications

    With you – an experimental end-to-end telepresence system using video-based reconstruction

    Get PDF
    We introduce withyou, our telepresence research platform. A systematic explanation of the theory brings together the linked nature of non-verbal communication and how it is influenced by technology. This leads to functional requirements for telepresence, in terms of the balance of visual, spatial and temporal qualities. The first end-to-end description of withyou describes all major processes and the display and capture environment. This includes two approaches to reconstructing the human form in 3D, from live video. An unprecedented characterization of our approach is given in terms of the above qualities, and influences of approach. This leads to non-functional requirements in terms of number and place of cameras and the avoidance of a resultant bottlekneck. Proposals are given for improved distribution of processes across networks, computers, and multi-core CPU and GPU. Simple conservative estimation shows that both approaches should meet our requirements. One is implemented and shown to meet minimum and come close to desirable requirements

    Efficient 3D Reconstruction, Streaming and Visualization of Static and Dynamic Scene Parts for Multi-client Live-telepresence in Large-scale Environments

    Full text link
    Despite the impressive progress of telepresence systems for room-scale scenes with static and dynamic scene entities, expanding their capabilities to scenarios with larger dynamic environments beyond a fixed size of a few square-meters remains challenging. In this paper, we aim at sharing 3D live-telepresence experiences in large-scale environments beyond room scale with both static and dynamic scene entities at practical bandwidth requirements only based on light-weight scene capture with a single moving consumer-grade RGB-D camera. To this end, we present a system which is built upon a novel hybrid volumetric scene representation in terms of the combination of a voxel-based scene representation for the static contents, that not only stores the reconstructed surface geometry but also contains information about the object semantics as well as their accumulated dynamic movement over time, and a point-cloud-based representation for dynamic scene parts, where the respective separation from static parts is achieved based on semantic and instance information extracted for the input frames. With an independent yet simultaneous streaming of both static and dynamic content, where we seamlessly integrate potentially moving but currently static scene entities in the static model until they are becoming dynamic again, as well as the fusion of static and dynamic data at the remote client, our system is able to achieve VR-based live-telepresence at close to real-time rates. Our evaluation demonstrates the potential of our novel approach in terms of visual quality, performance, and ablation studies regarding involved design choices

    LiveVV: Human-Centered Live Volumetric Video Streaming System

    Full text link
    Volumetric video has emerged as a prominent medium within the realm of eXtended Reality (XR) with the advancements in computer graphics and depth capture hardware. Users can fully immersive themselves in volumetric video with the ability to switch their viewport in six degree-of-freedom (DOF), including three rotational dimensions (yaw, pitch, roll) and three translational dimensions (X, Y, Z). Different from traditional 2D videos that are composed of pixel matrices, volumetric videos employ point clouds, meshes, or voxels to represent a volumetric scene, resulting in significantly larger data sizes. While previous works have successfully achieved volumetric video streaming in video-on-demand scenarios, the live streaming of volumetric video remains an unresolved challenge due to the limited network bandwidth and stringent latency constraints. In this paper, we for the first time propose a holistic live volumetric video streaming system, LiveVV, which achieves multi-view capture, scene segmentation \& reuse, adaptive transmission, and rendering. LiveVV contains multiple lightweight volumetric video capture modules that are capable of being deployed without prior preparation. To reduce bandwidth consumption, LiveVV processes static and dynamic volumetric content separately by reusing static data with low disparity and decimating data with low visual saliency. Besides, to deal with network fluctuation, LiveVV integrates a volumetric video adaptive bitrate streaming algorithm (VABR) to enable fluent playback with the maximum quality of experience. Extensive real-world experiment shows that LiveVV can achieve live volumetric video streaming at a frame rate of 24 fps with a latency of less than 350ms

    Videos in Context for Telecommunication and Spatial Browsing

    Get PDF
    The research presented in this thesis explores the use of videos embedded in panoramic imagery to transmit spatial and temporal information describing remote environments and their dynamics. Virtual environments (VEs) through which users can explore remote locations are rapidly emerging as a popular medium of presence and remote collaboration. However, capturing visual representation of locations to be used in VEs is usually a tedious process that requires either manual modelling of environments or the employment of specific hardware. Capturing environment dynamics is not straightforward either, and it is usually performed through specific tracking hardware. Similarly, browsing large unstructured video-collections with available tools is difficult, as the abundance of spatial and temporal information makes them hard to comprehend. At the same time, on a spectrum between 3D VEs and 2D images, panoramas lie in between, as they offer the same 2D images accessibility while preserving 3D virtual environments surrounding representation. For this reason, panoramas are an attractive basis for videoconferencing and browsing tools as they can relate several videos temporally and spatially. This research explores methods to acquire, fuse, render and stream data coming from heterogeneous cameras, with the help of panoramic imagery. Three distinct but interrelated questions are addressed. First, the thesis considers how spatially localised video can be used to increase the spatial information transmitted during video mediated communication, and if this improves quality of communication. Second, the research asks whether videos in panoramic context can be used to convey spatial and temporal information of a remote place and the dynamics within, and if this improves users' performance in tasks that require spatio-temporal thinking. Finally, the thesis considers whether there is an impact of display type on reasoning about events within videos in panoramic context. These research questions were investigated over three experiments, covering scenarios common to computer-supported cooperative work and video browsing. To support the investigation, two distinct video+context systems were developed. The first telecommunication experiment compared our videos in context interface with fully-panoramic video and conventional webcam video conferencing in an object placement scenario. The second experiment investigated the impact of videos in panoramic context on quality of spatio-temporal thinking during localization tasks. To support the experiment, a novel interface to video-collection in panoramic context was developed and compared with common video-browsing tools. The final experimental study investigated the impact of display type on reasoning about events. The study explored three adaptations of our video-collection interface to three display types. The overall conclusion is that videos in panoramic context offer a valid solution to spatio-temporal exploration of remote locations. Our approach presents a richer visual representation in terms of space and time than standard tools, showing that providing panoramic contexts to video collections makes spatio-temporal tasks easier. To this end, videos in context are suitable alternative to more difficult, and often expensive solutions. These findings are beneficial to many applications, including teleconferencing, virtual tourism and remote assistance

    What aspects of realism and faithfulness are relevant to supporting non-verbal communication through 3D mediums

    Get PDF
    This thesis investigates what aspects of realism and faithfulness are relevant to supporting non-verbal communication through visual mediums. The mediums examined are 2D video, 3D computer graphics and video based 3D reconstruction. The latter is 3D CGI derived from multiple streams of 2D video. People’s ability to identify behaviour of primates through gross non-verbal communication is compared across 2D video and 3D CGI. Findings suggest 3D CGI performs equally well to 2D video for the identification of gross non-verbal behaviour, however user feedback points to a lack of understanding of intent. Secondly, ability to detect truthfulness in humans across 2D video and video based 3D reconstruction mediums is examined. Effort of doing this is measured by studying changes in level of oxygenation to the prefrontal cortex. Discussion links to the literature to propose that tendency to over trust is inversely proportional to the range of non-verbal resources communicated through a medium. It is suggested that perhaps this is because “tells” are hidden. The third study identifies that video based 3D reconstruction can successfully illustrate subtle facial muscle movements on a par with 2D video, but does identify issues with the display of lower facial detail, due to a reconstruction error called droop. It is hoped that the combination of these strands of research will help users, and application developers, make more informed decisions when selecting which type of virtual character to implement for a particular application therefore contributing to the fields of virtual characters and virtual environments/serious gaming, by giving readers a greater understanding of virtual characters ability to convey non-verbal behaviour
    • …
    corecore