2,020 research outputs found

    MoveBox: Democratizing MoCap for the Microsoft Rocketbox Avatar Library

    Get PDF
    This paper presents MoveBox an open sourced toolbox for animating motion captured (MoCap) movements onto the Microsoft Rocketbox library of avatars. Motion capture is performed using a single depth sensor, such as Azure Kinect or Windows Kinect V2. Motion capture is performed in real-time using a single depth sensor, such as Azure Kinect or Windows Kinect V2, or extracted from existing RGB videos offline leveraging deep-learning computer vision techniques. Our toolbox enables real-time animation of the user’s avatar by converting the transformations between systems that have different joints and hierarchies. Additional features of the toolbox include recording, playback and looping animations, as well as basic audio lip sync, blinking and resizing of avatars as well as finger and hand animations. Our main contribution is both in the creation of this open source tool as well as the validation on different devices and discussion of MoveBox’s capabilities by end users

    Task-oriented and Semantics-aware Communication Framework for Augmented Reality

    Full text link
    Upon the advent of the emerging metaverse and its related applications in Augmented Reality (AR), the current bit-oriented network struggles to support real-time changes for the vast amount of associated information, hindering its development. Thus, a critical revolution in the Sixth Generation (6G) networks is envisioned through the joint exploitation of information context and its importance to the task, leading to a communication paradigm shift towards semantic and effectiveness levels. However, current research has not yet proposed any explicit and systematic communication framework for AR applications that incorporate these two levels. To fill this research gap, this paper presents a task-oriented and semantics-aware communication framework for augmented reality (TSAR) to enhance communication efficiency and effectiveness in 6G. Specifically, we first analyse the traditional wireless AR point cloud communication framework and then summarize our proposed semantic information along with the end-to-end wireless communication. We then detail the design blocks of the TSAR framework, covering both semantic and effectiveness levels. Finally, numerous experiments have been conducted to demonstrate that, compared to the traditional point cloud communication framework, our proposed TSAR significantly reduces wireless AR application transmission latency by 95.6%, while improving communication effectiveness in geometry and color aspects by up to 82.4% and 20.4%, respectively

    VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera

    Full text link
    We present the first real-time method to capture the full global 3D skeletal pose of a human in a stable, temporally consistent manner using a single RGB camera. Our method combines a new convolutional neural network (CNN) based pose regressor with kinematic skeleton fitting. Our novel fully-convolutional pose formulation regresses 2D and 3D joint positions jointly in real time and does not require tightly cropped input frames. A real-time kinematic skeleton fitting method uses the CNN output to yield temporally stable 3D global pose reconstructions on the basis of a coherent kinematic skeleton. This makes our approach the first monocular RGB method usable in real-time applications such as 3D character control---thus far, the only monocular methods for such applications employed specialized RGB-D cameras. Our method's accuracy is quantitatively on par with the best offline 3D monocular RGB pose estimation methods. Our results are qualitatively comparable to, and sometimes better than, results from monocular RGB-D approaches, such as the Kinect. However, we show that our approach is more broadly applicable than RGB-D solutions, i.e. it works for outdoor scenes, community videos, and low quality commodity RGB cameras.Comment: Accepted to SIGGRAPH 201

    Full Body Acting Rehearsal in a Networked Virtual Environment-A Case Study

    Get PDF
    In order to rehearse for a play or a scene from a movie, it is generally required that the actors are physically present at the same time in the same place. In this paper we present an example and experience of a full body motion shared virtual environment (SVE) for rehearsal. The system allows actors and directors to meet in an SVE in order to rehearse scenes for a play or a movie, that is, to perform some dialogue and blocking (positions, movements, and displacements of actors in the scene) rehearsal through a full body interactive virtual reality (VR) system. The system combines immersive VR rendering techniques as well as network capabilities together with full body tracking. Two actors and a director rehearsed from separate locations. One actor and the director were in London (located in separate rooms) while the second actor was in Barcelona. The Barcelona actor used a wide field-of-view head-tracked head-mounted display, and wore a body suit for real-time motion capture and display. The London actor was in a Cave system, with head and partial body tracking. Each actor was presented to the other as an avatar in the shared virtual environment, and the director could see the whole scenario on a desktop display, and intervene by voice commands. A video stream in a window displayed in the virtual environment also represented the director. The London participant was a professional actor, who afterward commented on the utility of the system for acting rehearsal. It was concluded that full body tracking and corresponding real-time display of all the actors' movements would be a critical requirement, and that blocking was possible down to the level of detail of gestures. Details of the implementation, actors, and director experiences are provided

    Enhanced Virtuality: Increasing the Usability and Productivity of Virtual Environments

    Get PDF
    Mit stetig steigender Bildschirmauflösung, genauerem Tracking und fallenden Preisen stehen Virtual Reality (VR) Systeme kurz davor sich erfolgreich am Markt zu etablieren. Verschiedene Werkzeuge helfen Entwicklern bei der Erstellung komplexer Interaktionen mit mehreren Benutzern innerhalb adaptiver virtueller Umgebungen. Allerdings entstehen mit der Verbreitung der VR-Systeme auch zusätzliche Herausforderungen: Diverse Eingabegeräte mit ungewohnten Formen und Tastenlayouts verhindern eine intuitive Interaktion. Darüber hinaus zwingt der eingeschränkte Funktionsumfang bestehender Software die Nutzer dazu, auf herkömmliche PC- oder Touch-basierte Systeme zurückzugreifen. Außerdem birgt die Zusammenarbeit mit anderen Anwendern am gleichen Standort Herausforderungen hinsichtlich der Kalibrierung unterschiedlicher Trackingsysteme und der Kollisionsvermeidung. Beim entfernten Zusammenarbeiten wird die Interaktion durch Latenzzeiten und Verbindungsverluste zusätzlich beeinflusst. Schließlich haben die Benutzer unterschiedliche Anforderungen an die Visualisierung von Inhalten, z.B. Größe, Ausrichtung, Farbe oder Kontrast, innerhalb der virtuellen Welten. Eine strikte Nachbildung von realen Umgebungen in VR verschenkt Potential und wird es nicht ermöglichen, die individuellen Bedürfnisse der Benutzer zu berücksichtigen. Um diese Probleme anzugehen, werden in der vorliegenden Arbeit Lösungen in den Bereichen Eingabe, Zusammenarbeit und Erweiterung von virtuellen Welten und Benutzern vorgestellt, die darauf abzielen, die Benutzerfreundlichkeit und Produktivität von VR zu erhöhen. Zunächst werden PC-basierte Hardware und Software in die virtuelle Welt übertragen, um die Vertrautheit und den Funktionsumfang bestehender Anwendungen in VR zu erhalten. Virtuelle Stellvertreter von physischen Geräten, z.B. Tastatur und Tablet, und ein VR-Modus für Anwendungen ermöglichen es dem Benutzer reale Fähigkeiten in die virtuelle Welt zu übertragen. Des Weiteren wird ein Algorithmus vorgestellt, der die Kalibrierung mehrerer ko-lokaler VR-Geräte mit hoher Genauigkeit und geringen Hardwareanforderungen und geringem Aufwand ermöglicht. Da VR-Headsets die reale Umgebung der Benutzer ausblenden, wird die Relevanz einer Ganzkörper-Avatar-Visualisierung für die Kollisionsvermeidung und das entfernte Zusammenarbeiten nachgewiesen. Darüber hinaus werden personalisierte räumliche oder zeitliche Modifikationen vorgestellt, die es erlauben, die Benutzerfreundlichkeit, Arbeitsleistung und soziale Präsenz von Benutzern zu erhöhen. Diskrepanzen zwischen den virtuellen Welten, die durch persönliche Anpassungen entstehen, werden durch Methoden der Avatar-Umlenkung (engl. redirection) kompensiert. Abschließend werden einige der Methoden und Erkenntnisse in eine beispielhafte Anwendung integriert, um deren praktische Anwendbarkeit zu verdeutlichen. Die vorliegende Arbeit zeigt, dass virtuelle Umgebungen auf realen Fähigkeiten und Erfahrungen aufbauen können, um eine vertraute und einfache Interaktion und Zusammenarbeit von Benutzern zu gewährleisten. Darüber hinaus ermöglichen individuelle Erweiterungen des virtuellen Inhalts und der Avatare Einschränkungen der realen Welt zu überwinden und das Erlebnis von VR-Umgebungen zu steigern

    A survey of real-time crowd rendering

    Get PDF
    In this survey we review, classify and compare existing approaches for real-time crowd rendering. We first overview character animation techniques, as they are highly tied to crowd rendering performance, and then we analyze the state of the art in crowd rendering. We discuss different representations for level-of-detail (LoD) rendering of animated characters, including polygon-based, point-based, and image-based techniques, and review different criteria for runtime LoD selection. Besides LoD approaches, we review classic acceleration schemes, such as frustum culling and occlusion culling, and describe how they can be adapted to handle crowds of animated characters. We also discuss specific acceleration techniques for crowd rendering, such as primitive pseudo-instancing, palette skinning, and dynamic key-pose caching, which benefit from current graphics hardware. We also address other factors affecting performance and realism of crowds such as lighting, shadowing, clothing and variability. Finally we provide an exhaustive comparison of the most relevant approaches in the field.Peer ReviewedPostprint (author's final draft

    Metaverse for Wireless Systems: Architecture, Advances, Standardization, and Open Challenges

    Full text link
    The growing landscape of emerging wireless applications is a key driver toward the development of novel wireless system designs. Such a design can be based on the metaverse that uses a virtual model of the physical world systems along with other schemes/technologies (e.g., optimization theory, machine learning, and blockchain). A metaverse using a virtual model performs proactive intelligent analytics prior to a user request for efficient management of the wireless system resources. Additionally, a metaverse will enable self-sustainability to operate wireless systems with the least possible intervention from network operators. Although the metaverse can offer many benefits, it faces some challenges as well. Therefore, in this tutorial, we discuss the role of a metaverse in enabling wireless applications. We present an overview, key enablers, design aspects (i.e., metaverse for wireless and wireless for metaverse), and a novel high-level architecture of metaverse-based wireless systems. We discuss metaverse management, reliability, and security of the metaverse-based system. Furthermore, we discuss recent advances and standardization of metaverse-enabled wireless system. Finally, we outline open challenges and present possible solutions

    Inferring Implicit 3D Representations from Human Figures on Pictorial Maps

    Full text link
    In this work, we present an automated workflow to bring human figures, one of the most frequently appearing entities on pictorial maps, to the third dimension. Our workflow is based on training data and neural networks for single-view 3D reconstruction of real humans from photos. We first let a network consisting of fully connected layers estimate the depth coordinate of 2D pose points. The gained 3D pose points are inputted together with 2D masks of body parts into a deep implicit surface network to infer 3D signed distance fields (SDFs). By assembling all body parts, we derive 2D depth images and body part masks of the whole figure for different views, which are fed into a fully convolutional network to predict UV images. These UV images and the texture for the given perspective are inserted into a generative network to inpaint the textures for the other views. The textures are enhanced by a cartoonization network and facial details are resynthesized by an autoencoder. Finally, the generated textures are assigned to the inferred body parts in a ray marcher. We test our workflow with 12 pictorial human figures after having validated several network configurations. The created 3D models look generally promising, especially when considering the challenges of silhouette-based 3D recovery and real-time rendering of the implicit SDFs. Further improvement is needed to reduce gaps between the body parts and to add pictorial details to the textures. Overall, the constructed figures may be used for animation and storytelling in digital 3D maps.Comment: to be published in 'Cartography and Geographic Information Science

    Instant Volumetric Head Avatars

    Full text link
    We present Instant Volumetric Head Avatars (INSTA), a novel approach for reconstructing photo-realistic digital avatars instantaneously. INSTA models a dynamic neural radiance field based on neural graphics primitives embedded around a parametric face model. Our pipeline is trained on a single monocular RGB portrait video that observes the subject under different expressions and views. While state-of-the-art methods take up to several days to train an avatar, our method can reconstruct a digital avatar in less than 10 minutes on modern GPU hardware, which is orders of magnitude faster than previous solutions. In addition, it allows for the interactive rendering of novel poses and expressions. By leveraging the geometry prior of the underlying parametric face model, we demonstrate that INSTA extrapolates to unseen poses. In quantitative and qualitative studies on various subjects, INSTA outperforms state-of-the-art methods regarding rendering quality and training time.Comment: Website: https://zielon.github.io/insta/ Video: https://youtu.be/HOgaeWTih7

    Theory of the Avatar

    Get PDF
    The internet has given birth to an expanding number of shared virtual reality spaces, with a collective population well into the millions. These virtual worlds exhibit most of the traits we associate with the Earth world: economic transactions, interpersonal relationships, organic political institutions, and so on. A human being experiences these worlds through an avatar, which is the representation of the self in a given physical medium. Most worlds allow an agent to choose what kind of avatar she or he will inhabit, allowing a person with any kind of Earth body to inhabit a completely different body in the virtual world. The emergence of avatar-mediated living raises both positive and normative questions. This paper explores several choice models involving avatars. Analysis of these models suggests that the emergence of avatar-mediated life may increase aggregate human well-being, while decreasing its cross-sectional variance. These efficiency and equity effects are contingent on the maintenance and protection of certain rights, however, including the right of agents to free movement, unbiased information, and political participation.information and internet services, computer software, equity, justice, inequality, synthetic worlds
    corecore