816 research outputs found

    Towards Intelligent Telerobotics: Visualization and Control of Remote Robot

    Get PDF
    Human-machine cooperative or co-robotics has been recognized as the next generation of robotics. In contrast to current systems that use limited-reasoning strategies or address problems in narrow contexts, new co-robot systems will be characterized by their flexibility, resourcefulness, varied modeling or reasoning approaches, and use of real-world data in real time, demonstrating a level of intelligence and adaptability seen in humans and animals. The research I focused is in the two sub-field of co-robotics: teleoperation and telepresence. We firstly explore the ways of teleoperation using mixed reality techniques. I proposed a new type of display: hybrid-reality display (HRD) system, which utilizes commodity projection device to project captured video frame onto 3D replica of the actual target surface. It provides a direct alignment between the frame of reference for the human subject and that of the displayed image. The advantage of this approach lies in the fact that no wearing device needed for the users, providing minimal intrusiveness and accommodating users eyes during focusing. The field-of-view is also significantly increased. From a user-centered design standpoint, the HRD is motivated by teleoperation accidents, incidents, and user research in military reconnaissance etc. Teleoperation in these environments is compromised by the Keyhole Effect, which results from the limited field of view of reference. The technique contribution of the proposed HRD system is the multi-system calibration which mainly involves motion sensor, projector, cameras and robotic arm. Due to the purpose of the system, the accuracy of calibration should also be restricted within millimeter level. The followed up research of HRD is focused on high accuracy 3D reconstruction of the replica via commodity devices for better alignment of video frame. Conventional 3D scanner lacks either depth resolution or be very expensive. We proposed a structured light scanning based 3D sensing system with accuracy within 1 millimeter while robust to global illumination and surface reflection. Extensive user study prove the performance of our proposed algorithm. In order to compensate the unsynchronization between the local station and remote station due to latency introduced during data sensing and communication, 1-step-ahead predictive control algorithm is presented. The latency between human control and robot movement can be formulated as a linear equation group with a smooth coefficient ranging from 0 to 1. This predictive control algorithm can be further formulated by optimizing a cost function. We then explore the aspect of telepresence. Many hardware designs have been developed to allow a camera to be placed optically directly behind the screen. The purpose of such setups is to enable two-way video teleconferencing that maintains eye-contact. However, the image from the see-through camera usually exhibits a number of imaging artifacts such as low signal to noise ratio, incorrect color balance, and lost of details. Thus we develop a novel image enhancement framework that utilizes an auxiliary color+depth camera that is mounted on the side of the screen. By fusing the information from both cameras, we are able to significantly improve the quality of the see-through image. Experimental results have demonstrated that our fusion method compares favorably against traditional image enhancement/warping methods that uses only a single image

    High-quality, real-time 3D video visualization in head mounted displays

    Get PDF
    The main goal of this thesis research was to develop the ability to visualize high- quality, three-dimensional (3D) data within a virtual reality head mounted display (HMD). High-quality, 3D data collection has become easier in past years due to the development of 3D scanning technologies such as structured light methods. Structured light scanning and modern 3D data compression techniques have improved to the point at which 3D data can be captured, processed, compressed, streamed across a network, decompressed, reconstructed, and visualized all in near real-time. Now the question becomes what can be done with this live 3D information? A web application allows for real-time visualization of and interaction with this 3D video on the web. Streaming this data to the web allows for greater ease of access by a larger population. In the past, only two-dimensional (2D) video streaming has been available to the public via the web or installed desktop software. Commonly, 2D video streaming technologies, such as Skype, FaceTime or Google Hangout, are used to connect people around the world for both business and recreational purposes. As the trend continues in which society conducts itself in online environments, improvements to these telecommunication and telecollaboration technologies must be made as current systems have reached their limitations. These improvements are to ensure that interactions are as natural and as user-friendly as possible. One resolution to the limitations imposed by 2D video streaming is to stream 3D video via the aforementioned technologies to a user in a virtual reality HMD. With 3D data, improvements such as eye-gaze correction, obtaining a natural angle of viewing, and more can be accomplished. One common advantage of using 3D data in lieu of 2D data is what can be done with it during redisplay. For example, when a viewer moves about their environment in a physical space while on Skype, the 2D image on their computer monitor does not change; however, via the use of an HMD, the user can naturally view and move about their partner in 3D space almost as if they were sitting directly across from them. With these improvements, increased user perception and level of immersion in the digital world has been achieved. This allows users to perform at an increased level of efficiency in telecollaboration and telecommunication environments due to the increased ability to visualize and communicate more naturally with another human being. This thesis will present some preliminary results which support the notion that users better perceive their environments and also have a greater sense of interpersonal communica- tion when immersed in a 3D video scenario as opposed to a 2D video scenario. This novel technology utilizes high-quality and real-time 3D scanning and 3D compression techniques which in turn allows the user to experience a realistic reconstruction within a virtual reality HMD

    Videos in Context for Telecommunication and Spatial Browsing

    Get PDF
    The research presented in this thesis explores the use of videos embedded in panoramic imagery to transmit spatial and temporal information describing remote environments and their dynamics. Virtual environments (VEs) through which users can explore remote locations are rapidly emerging as a popular medium of presence and remote collaboration. However, capturing visual representation of locations to be used in VEs is usually a tedious process that requires either manual modelling of environments or the employment of specific hardware. Capturing environment dynamics is not straightforward either, and it is usually performed through specific tracking hardware. Similarly, browsing large unstructured video-collections with available tools is difficult, as the abundance of spatial and temporal information makes them hard to comprehend. At the same time, on a spectrum between 3D VEs and 2D images, panoramas lie in between, as they offer the same 2D images accessibility while preserving 3D virtual environments surrounding representation. For this reason, panoramas are an attractive basis for videoconferencing and browsing tools as they can relate several videos temporally and spatially. This research explores methods to acquire, fuse, render and stream data coming from heterogeneous cameras, with the help of panoramic imagery. Three distinct but interrelated questions are addressed. First, the thesis considers how spatially localised video can be used to increase the spatial information transmitted during video mediated communication, and if this improves quality of communication. Second, the research asks whether videos in panoramic context can be used to convey spatial and temporal information of a remote place and the dynamics within, and if this improves users' performance in tasks that require spatio-temporal thinking. Finally, the thesis considers whether there is an impact of display type on reasoning about events within videos in panoramic context. These research questions were investigated over three experiments, covering scenarios common to computer-supported cooperative work and video browsing. To support the investigation, two distinct video+context systems were developed. The first telecommunication experiment compared our videos in context interface with fully-panoramic video and conventional webcam video conferencing in an object placement scenario. The second experiment investigated the impact of videos in panoramic context on quality of spatio-temporal thinking during localization tasks. To support the experiment, a novel interface to video-collection in panoramic context was developed and compared with common video-browsing tools. The final experimental study investigated the impact of display type on reasoning about events. The study explored three adaptations of our video-collection interface to three display types. The overall conclusion is that videos in panoramic context offer a valid solution to spatio-temporal exploration of remote locations. Our approach presents a richer visual representation in terms of space and time than standard tools, showing that providing panoramic contexts to video collections makes spatio-temporal tasks easier. To this end, videos in context are suitable alternative to more difficult, and often expensive solutions. These findings are beneficial to many applications, including teleconferencing, virtual tourism and remote assistance

    Omnidirectional texturing of human actors from multiple view video sequences

    Get PDF
    National audienceCes dernières années, de plus en plus d'activités de recherche sont consacrées à l'étude de la vidéo tridimensionnelle, créée à partir de plusieurs flux vidéo. Le but est d'obtenir une vidéo free-viewpoint, où l'utilisateur peut observer d'un point de vue arbitraire, choisi de manière interactive, une scène filmée par plusieurs caméras. Les applications possibles sont diverses. Un système free-viewpoint peut augmenter le réalisme visuel de la technologie de téléprésence. De ce fait des utilisateurs situés physiquement en différents endroits peuvent collaborer à travers un même environnement virtuel. En outre, les effets spéciaux employés par l'industrie du film, comme ceux introduits dans le film Matrix (freeze-and-rotate), seraient rendus accessibles à tous les utilisateurs. Dans la plupart des applications de réalité virtuelle, nous cherchons à représenter des acteurs sous la forme d'avatar. C'est pourquoi la recherche est importante dans ce domaine. Pour les vidéos de type free-viewpoint, la scène est filmée simultanément par différentes caméras depuis plusieurs points de vue. Les flux vidéo obtenus par les caméras sont utilisés pour créer un modèle 3D de la scène. Cette reconstruction tridimensionnelle est indispensable pour que l'utilisateur puisse regarder la scène depuis n'importe quel point de vue. Dans le cadre de la réalité virtuelle, il est possible d'ajouter de nouveaux objets dans cette scène (objets virtuels) et de traiter les problèmes d'éclairage (ombres au sol, . . .), ainsi que les problèmes d'occultation [7, 8]. Le modèle 3D peut être décrit en utilisant différentes méthodes, telles que des maillages, des échantillons de points ou des voxels. Pour rendre le modèle plus réaliste, les flux vidéo provenant des caméras sont plaqués sur le modèle 3D. Finalement, en combinant le modèle 3D reconstruit et les différents flux vidéo, nous sommes capables de reconstruire un monde virtuel réaliste. Le but du stage effectué a été de réaliser le 'texturage' en temps réel d'un modèle 3D d'un animateur. L'étude a été effectuée dans le cadre du projet CYBER-II 2 . Ce projet vise à simuler, en temps réel (au minimum 25 images par secondes), la présence d'une personne (par exemple un présentateur de télévision ou un professeur) dans un environnement virtuel

    Rendering and display for multi-viewer tele-immersion

    Get PDF
    Video teleconferencing systems are widely deployed for business, education and personal use to enable face-to-face communication between people at distant sites. Unfortunately, the two-dimensional video of conventional systems does not correctly convey several important non-verbal communication cues such as eye contact and gaze awareness. Tele-immersion refers to technologies aimed at providing distant users with a more compelling sense of remote presence than conventional video teleconferencing. This dissertation is concerned with the particular challenges of interaction between groups of users at remote sites. The problems of video teleconferencing are exacerbated when groups of people communicate. Ideally, a group tele-immersion system would display views of the remote site at the right size and location, from the correct viewpoint for each local user. However, is is not practical to put a camera in every possible eye location, and it is not clear how to provide each viewer with correct and unique imagery. I introduce rendering techniques and multi-view display designs to support eye contact and gaze awareness between groups of viewers at two distant sites. With a shared 2D display, virtual camera views can improve local spatial cues while preserving scene continuity, by rendering the scene from novel viewpoints that may not correspond to a physical camera. I describe several techniques, including a compact light field, a plane sweeping algorithm, a depth dependent camera model, and video-quality proxies, suitable for producing useful views of a remote scene for a group local viewers. The first novel display provides simultaneous, unique monoscopic views to several users, with fewer user position restrictions than existing autostereoscopic displays. The second is a random hole barrier autostereoscopic display that eliminates the viewing zones and user position requirements of conventional autostereoscopic displays, and provides unique 3D views for multiple users in arbitrary locations

    Omnidirectional texturing of human actors from multiple view video sequences

    Get PDF
    International audienceIn 3D video, recorded object behaviors can be observed from any viewpoint, because the 3D video registers the object's 3D shape and color. However, the real-world views are limited to the views from a number of cameras, so only a coarse model of the object can be recovered in real-time. It becomes then necessary to judiciously texture the object with images recovered from the cameras. One of the problems in multi-texturing is to decide what portion of the 3D model is visible from what camera. We propose a texture-mapping algorithm that tries to bypass the problem of exactly deciding if a point is visible or not from a certain camera. Given more than two color values for each pixel, a statistical test allows to exclude outlying color data before blending

    Fusing Multimedia Data Into Dynamic Virtual Environments

    Get PDF
    In spite of the dramatic growth of virtual and augmented reality (VR and AR) technology, content creation for immersive and dynamic virtual environments remains a significant challenge. In this dissertation, we present our research in fusing multimedia data, including text, photos, panoramas, and multi-view videos, to create rich and compelling virtual environments. First, we present Social Street View, which renders geo-tagged social media in its natural geo-spatial context provided by 360° panoramas. Our system takes into account visual saliency and uses maximal Poisson-disc placement with spatiotemporal filters to render social multimedia in an immersive setting. We also present a novel GPU-driven pipeline for saliency computation in 360° panoramas using spherical harmonics (SH). Our spherical residual model can be applied to virtual cinematography in 360° videos. We further present Geollery, a mixed-reality platform to render an interactive mirrored world in real time with three-dimensional (3D) buildings, user-generated content, and geo-tagged social media. Our user study has identified several use cases for these systems, including immersive social storytelling, experiencing the culture, and crowd-sourced tourism. We next present Video Fields, a web-based interactive system to create, calibrate, and render dynamic videos overlaid on 3D scenes. Our system renders dynamic entities from multiple videos, using early and deferred texture sampling. Video Fields can be used for immersive surveillance in virtual environments. Furthermore, we present VRSurus and ARCrypt projects to explore the applications of gestures recognition, haptic feedback, and visual cryptography for virtual and augmented reality. Finally, we present our work on Montage4D, a real-time system for seamlessly fusing multi-view video textures with dynamic meshes. We use geodesics on meshes with view-dependent rendering to mitigate spatial occlusion seams while maintaining temporal consistency. Our experiments show significant enhancement in rendering quality, especially for salient regions such as faces. We believe that Social Street View, Geollery, Video Fields, and Montage4D will greatly facilitate several applications such as virtual tourism, immersive telepresence, and remote education

    An interactive camera placement and visibility simulator for image-based VR applications

    Full text link
    • …
    corecore