45 research outputs found

    Generative RGB-D face completion for head-mounted display removal

    Get PDF
    Head-mounted displays (HMDs) are an essential display device for the observation of virtual reality (VR) environments. However, HMDs obstruct external capturing methods from recording the user's upper face. This severely impacts social VR applications, such as teleconferencing, which commonly rely on external RGB-D sensors to capture a volumetric representation of the user. In this paper, we introduce an HMD removal framework based on generative adversarial networks (GANs), capable of jointly filling in missing color and depth data in RGB-D face images. Our framework includes an RGB-based identity loss function for identity preservation and several components aimed at surface reproduction. Our results demonstrate that our framework is able to remove HMDs from synthetic RGB-D face images while preserving the subject's identity

    A user interface for terrain modelling in virtual reality using a head mounted display

    Get PDF
    The increased commercial availability of virtual reality (VR) devices has resulted in more content being created for virtual environments (VEs). This content creation has mainly taken place using traditional desktop systems but certain applications are now integrating VR into the creation pipeline. Therefore we look at the effectiveness of creating content, specifically designing terrains, for use in immersive environments using VR technology. To do this, we develop a VR interface for terrain creation based on an existing desktop application. The interface incorporates a head-mounted display and 6 degree of freedom controllers. This allows the mapping of user controls to more natural movements compared to the abstract controls in mouse and keyboard based systems. It also means that users can view the terrain in full 3D due to the inherent stereoscopy of the VR display. The interface goes through three iterations of user centred design and testing. This results in paper and low fidelity prototypes being created before the final interface is developed. The performance of this final VR interface is then compared to the desktop interface on which it was based. We carry out user tests to assess the performance of each interface in terms of speed, accuracy and usability. From our results we find that there is no significant difference between the interfaces when it comes to accuracy but that the desktop interface is superior in terms of speed while the VR interface was rated as having higher usability. Some of the possible reasons for these results, such as users preferring the natural interactions offered by the VR interface but not having sufficient training to fully take advantage of it, are discussed. Finally, we conclude that while it was not shown that either interface is clearly superior, there is certainly room for further exploration of this research area. Recommendations for how to incorporate lessons learned during the creation of this dissertation into any further research are also made

    Lightweight Modules for Efficient Deep Learning based Image Restoration

    Full text link
    Low level image restoration is an integral component of modern artificial intelligence (AI) driven camera pipelines. Most of these frameworks are based on deep neural networks which present a massive computational overhead on resource constrained platform like a mobile phone. In this paper, we propose several lightweight low-level modules which can be used to create a computationally low cost variant of a given baseline model. Recent works for efficient neural networks design have mainly focused on classification. However, low-level image processing falls under the image-to-image' translation genre which requires some additional computational modules not present in classification. This paper seeks to bridge this gap by designing generic efficient modules which can replace essential components used in contemporary deep learning based image restoration networks. We also present and analyse our results highlighting the drawbacks of applying depthwise separable convolutional kernel (a popular method for efficient classification network) for sub-pixel convolution based upsampling (a popular upsampling strategy for low-level vision applications). This shows that concepts from domain of classification cannot always be seamlessly integrated into image-to-image translation tasks. We extensively validate our findings on three popular tasks of image inpainting, denoising and super-resolution. Our results show that proposed networks consistently output visually similar reconstructions compared to full capacity baselines with significant reduction of parameters, memory footprint and execution speeds on contemporary mobile devices.Comment: Accepted at: IEEE Transactions on Circuits and Systems for Video Technology (Early Access Print) | |Codes Available at: https://github.com/avisekiit/TCSVT-LightWeight-CNNs | Supplementary Document at: https://drive.google.com/file/d/1BQhkh33Sen-d0qOrjq5h8ahw2VCUIVLg/view?usp=sharin

    Scalable and Extensible Augmented Reality with Applications in Civil Infrastructure Systems.

    Full text link
    In Civil Infrastructure System (CIS) applications, the requirement of blending synthetic and physical objects distinguishes Augmented Reality (AR) from other visualization technologies in three aspects: 1) it reinforces the connections between people and objects, and promotes engineers’ appreciation about their working context; 2) It allows engineers to perform field tasks with the awareness of both the physical and synthetic environment; 3) It offsets the significant cost of 3D Model Engineering by including the real world background. The research has successfully overcome several long-standing technical obstacles in AR and investigated technical approaches to address fundamental challenges that prevent the technology from being usefully deployed in CIS applications, such as the alignment of virtual objects with the real environment continuously across time and space; blending of virtual entities with their real background faithfully to create a sustained illusion of co- existence; integrating these methods to a scalable and extensible computing AR framework that is openly accessible to the teaching and research community, and can be readily reused and extended by other researchers and engineers. The research findings have been evaluated in several challenging CIS applications where the potential of having a significant economic and social impact is high. Examples of validation test beds implemented include an AR visual excavator-utility collision avoidance system that enables spotters to ”see” buried utilities hidden under the ground surface, thus helping prevent accidental utility strikes; an AR post-disaster reconnaissance framework that enables building inspectors to rapidly evaluate and quantify structural damage sustained by buildings in seismic events such as earthquakes or blasts; and a tabletop collaborative AR visualization framework that allows multiple users to observe and interact with visual simulations of engineering processes.PHDCivil EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/96145/1/dsuyang_1.pd

    Augmented Reality and Its Application

    Get PDF
    Augmented Reality (AR) is a discipline that includes the interactive experience of a real-world environment, in which real-world objects and elements are enhanced using computer perceptual information. It has many potential applications in education, medicine, and engineering, among other fields. This book explores these potential uses, presenting case studies and investigations of AR for vocational training, emergency response, interior design, architecture, and much more

    Videos in Context for Telecommunication and Spatial Browsing

    Get PDF
    The research presented in this thesis explores the use of videos embedded in panoramic imagery to transmit spatial and temporal information describing remote environments and their dynamics. Virtual environments (VEs) through which users can explore remote locations are rapidly emerging as a popular medium of presence and remote collaboration. However, capturing visual representation of locations to be used in VEs is usually a tedious process that requires either manual modelling of environments or the employment of specific hardware. Capturing environment dynamics is not straightforward either, and it is usually performed through specific tracking hardware. Similarly, browsing large unstructured video-collections with available tools is difficult, as the abundance of spatial and temporal information makes them hard to comprehend. At the same time, on a spectrum between 3D VEs and 2D images, panoramas lie in between, as they offer the same 2D images accessibility while preserving 3D virtual environments surrounding representation. For this reason, panoramas are an attractive basis for videoconferencing and browsing tools as they can relate several videos temporally and spatially. This research explores methods to acquire, fuse, render and stream data coming from heterogeneous cameras, with the help of panoramic imagery. Three distinct but interrelated questions are addressed. First, the thesis considers how spatially localised video can be used to increase the spatial information transmitted during video mediated communication, and if this improves quality of communication. Second, the research asks whether videos in panoramic context can be used to convey spatial and temporal information of a remote place and the dynamics within, and if this improves users' performance in tasks that require spatio-temporal thinking. Finally, the thesis considers whether there is an impact of display type on reasoning about events within videos in panoramic context. These research questions were investigated over three experiments, covering scenarios common to computer-supported cooperative work and video browsing. To support the investigation, two distinct video+context systems were developed. The first telecommunication experiment compared our videos in context interface with fully-panoramic video and conventional webcam video conferencing in an object placement scenario. The second experiment investigated the impact of videos in panoramic context on quality of spatio-temporal thinking during localization tasks. To support the experiment, a novel interface to video-collection in panoramic context was developed and compared with common video-browsing tools. The final experimental study investigated the impact of display type on reasoning about events. The study explored three adaptations of our video-collection interface to three display types. The overall conclusion is that videos in panoramic context offer a valid solution to spatio-temporal exploration of remote locations. Our approach presents a richer visual representation in terms of space and time than standard tools, showing that providing panoramic contexts to video collections makes spatio-temporal tasks easier. To this end, videos in context are suitable alternative to more difficult, and often expensive solutions. These findings are beneficial to many applications, including teleconferencing, virtual tourism and remote assistance

    From motion capture to interactive virtual worlds : towards unconstrained motion-capture algorithms for real-time performance-driven character animation

    Get PDF
    This dissertation takes performance-driven character animation as a representative application and advances motion capture algorithms and animation methods to meet its high demands. Existing approaches have either coarse resolution and restricted capture volume, require expensive and complex multi-camera systems, or use intrusive suits and controllers. For motion capture, set-up time is reduced using fewer cameras, accuracy is increased despite occlusions and general environments, initialization is automated, and free roaming is enabled by egocentric cameras. For animation, increased robustness enables the use of low-cost sensors input, custom control gesture definition is guided to support novice users, and animation expressiveness is increased. The important contributions are: 1) an analytic and differentiable visibility model for pose optimization under strong occlusions, 2) a volumetric contour model for automatic actor initialization in general scenes, 3) a method to annotate and augment image-pose databases automatically, 4) the utilization of unlabeled examples for character control, and 5) the generalization and disambiguation of cyclical gestures for faithful character animation. In summary, the whole process of human motion capture, processing, and application to animation is advanced. These advances on the state of the art have the potential to improve many interactive applications, within and outside virtual reality.Diese Arbeit befasst sich mit Performance-driven Character Animation, insbesondere werden Motion Capture-Algorithmen entwickelt um den hohen Anforderungen dieser Beispielanwendung gerecht zu werden. Existierende Methoden haben entweder eine geringe Genauigkeit und einen eingeschränkten Aufnahmebereich oder benötigen teure Multi-Kamera-Systeme, oder benutzen störende Controller und spezielle Anzüge. Für Motion Capture wird die Setup-Zeit verkürzt, die Genauigkeit für Verdeckungen und generelle Umgebungen erhöht, die Initialisierung automatisiert, und Bewegungseinschränkung verringert. Für Character Animation wird die Robustheit für ungenaue Sensoren erhöht, Hilfe für benutzerdefinierte Gestendefinition geboten, und die Ausdrucksstärke der Animation verbessert. Die wichtigsten Beiträge sind: 1) ein analytisches und differenzierbares Sichtbarkeitsmodell für Rekonstruktionen unter starken Verdeckungen, 2) ein volumetrisches Konturenmodell für automatische Körpermodellinitialisierung in genereller Umgebung, 3) eine Methode zur automatischen Annotation von Posen und Augmentation von Bildern in großen Datenbanken, 4) das Nutzen von Beispielbewegungen für Character Animation, und 5) die Generalisierung und Übertragung von zyklischen Gesten für genaue Charakteranimation. Es wird der gesamte Prozess erweitert, von Motion Capture bis hin zu Charakteranimation. Die Verbesserungen sind für viele interaktive Anwendungen geeignet, innerhalb und außerhalb von virtueller Realität

    From motion capture to interactive virtual worlds : towards unconstrained motion-capture algorithms for real-time performance-driven character animation

    Get PDF
    This dissertation takes performance-driven character animation as a representative application and advances motion capture algorithms and animation methods to meet its high demands. Existing approaches have either coarse resolution and restricted capture volume, require expensive and complex multi-camera systems, or use intrusive suits and controllers. For motion capture, set-up time is reduced using fewer cameras, accuracy is increased despite occlusions and general environments, initialization is automated, and free roaming is enabled by egocentric cameras. For animation, increased robustness enables the use of low-cost sensors input, custom control gesture definition is guided to support novice users, and animation expressiveness is increased. The important contributions are: 1) an analytic and differentiable visibility model for pose optimization under strong occlusions, 2) a volumetric contour model for automatic actor initialization in general scenes, 3) a method to annotate and augment image-pose databases automatically, 4) the utilization of unlabeled examples for character control, and 5) the generalization and disambiguation of cyclical gestures for faithful character animation. In summary, the whole process of human motion capture, processing, and application to animation is advanced. These advances on the state of the art have the potential to improve many interactive applications, within and outside virtual reality.Diese Arbeit befasst sich mit Performance-driven Character Animation, insbesondere werden Motion Capture-Algorithmen entwickelt um den hohen Anforderungen dieser Beispielanwendung gerecht zu werden. Existierende Methoden haben entweder eine geringe Genauigkeit und einen eingeschränkten Aufnahmebereich oder benötigen teure Multi-Kamera-Systeme, oder benutzen störende Controller und spezielle Anzüge. Für Motion Capture wird die Setup-Zeit verkürzt, die Genauigkeit für Verdeckungen und generelle Umgebungen erhöht, die Initialisierung automatisiert, und Bewegungseinschränkung verringert. Für Character Animation wird die Robustheit für ungenaue Sensoren erhöht, Hilfe für benutzerdefinierte Gestendefinition geboten, und die Ausdrucksstärke der Animation verbessert. Die wichtigsten Beiträge sind: 1) ein analytisches und differenzierbares Sichtbarkeitsmodell für Rekonstruktionen unter starken Verdeckungen, 2) ein volumetrisches Konturenmodell für automatische Körpermodellinitialisierung in genereller Umgebung, 3) eine Methode zur automatischen Annotation von Posen und Augmentation von Bildern in großen Datenbanken, 4) das Nutzen von Beispielbewegungen für Character Animation, und 5) die Generalisierung und Übertragung von zyklischen Gesten für genaue Charakteranimation. Es wird der gesamte Prozess erweitert, von Motion Capture bis hin zu Charakteranimation. Die Verbesserungen sind für viele interaktive Anwendungen geeignet, innerhalb und außerhalb von virtueller Realität
    corecore