2,020 research outputs found
MoveBox: Democratizing MoCap for the Microsoft Rocketbox Avatar Library
This paper presents MoveBox an open sourced toolbox for animating motion captured (MoCap) movements onto the Microsoft Rocketbox library of avatars. Motion capture is performed using a single depth sensor, such as Azure Kinect or Windows Kinect V2. Motion capture is performed in real-time using a single depth sensor, such as Azure Kinect or Windows Kinect V2, or extracted from existing RGB videos offline leveraging deep-learning computer vision techniques. Our toolbox enables real-time animation of the user’s avatar by converting the transformations between systems that have different joints and hierarchies. Additional features of the toolbox include recording, playback and looping animations, as well as basic audio lip sync, blinking and resizing of avatars as well as finger and hand animations. Our main contribution is both in the creation of this open source tool as well as the validation on different devices and discussion of MoveBox’s capabilities by end users
Task-oriented and Semantics-aware Communication Framework for Augmented Reality
Upon the advent of the emerging metaverse and its related applications in
Augmented Reality (AR), the current bit-oriented network struggles to support
real-time changes for the vast amount of associated information, hindering its
development. Thus, a critical revolution in the Sixth Generation (6G) networks
is envisioned through the joint exploitation of information context and its
importance to the task, leading to a communication paradigm shift towards
semantic and effectiveness levels. However, current research has not yet
proposed any explicit and systematic communication framework for AR
applications that incorporate these two levels. To fill this research gap, this
paper presents a task-oriented and semantics-aware communication framework for
augmented reality (TSAR) to enhance communication efficiency and effectiveness
in 6G. Specifically, we first analyse the traditional wireless AR point cloud
communication framework and then summarize our proposed semantic information
along with the end-to-end wireless communication. We then detail the design
blocks of the TSAR framework, covering both semantic and effectiveness levels.
Finally, numerous experiments have been conducted to demonstrate that, compared
to the traditional point cloud communication framework, our proposed TSAR
significantly reduces wireless AR application transmission latency by 95.6%,
while improving communication effectiveness in geometry and color aspects by up
to 82.4% and 20.4%, respectively
VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera
We present the first real-time method to capture the full global 3D skeletal
pose of a human in a stable, temporally consistent manner using a single RGB
camera. Our method combines a new convolutional neural network (CNN) based pose
regressor with kinematic skeleton fitting. Our novel fully-convolutional pose
formulation regresses 2D and 3D joint positions jointly in real time and does
not require tightly cropped input frames. A real-time kinematic skeleton
fitting method uses the CNN output to yield temporally stable 3D global pose
reconstructions on the basis of a coherent kinematic skeleton. This makes our
approach the first monocular RGB method usable in real-time applications such
as 3D character control---thus far, the only monocular methods for such
applications employed specialized RGB-D cameras. Our method's accuracy is
quantitatively on par with the best offline 3D monocular RGB pose estimation
methods. Our results are qualitatively comparable to, and sometimes better
than, results from monocular RGB-D approaches, such as the Kinect. However, we
show that our approach is more broadly applicable than RGB-D solutions, i.e. it
works for outdoor scenes, community videos, and low quality commodity RGB
cameras.Comment: Accepted to SIGGRAPH 201
Full Body Acting Rehearsal in a Networked Virtual Environment-A Case Study
In order to rehearse for a play or a scene from a movie, it is generally required that the actors are physically present at the same time in the same place. In this paper we present an example and experience of a full body motion shared virtual environment (SVE) for rehearsal. The system allows actors and directors to meet in an SVE in order to rehearse scenes for a play or a movie, that is, to perform some dialogue and blocking (positions, movements, and displacements of actors in the scene) rehearsal through a full body interactive virtual reality (VR) system. The system combines immersive VR rendering techniques as well as network capabilities together with full body tracking. Two actors and a director rehearsed from separate locations. One actor and the director were in London (located in separate rooms) while the second actor was in Barcelona. The Barcelona actor used a wide field-of-view head-tracked head-mounted display, and wore a body suit for real-time motion capture and display. The London actor was in a Cave system, with head and partial body tracking. Each actor was presented to the other as an avatar in the shared virtual environment, and the director could see the whole scenario on a desktop display, and intervene by voice commands. A video stream in a window displayed in the virtual environment also represented the director. The London participant was a professional actor, who afterward commented on the utility of the system for acting rehearsal. It was concluded that full body tracking and corresponding real-time display of all the actors' movements would be a critical requirement, and that blocking was possible down to the level of detail of gestures. Details of the implementation, actors, and director experiences are provided
Enhanced Virtuality: Increasing the Usability and Productivity of Virtual Environments
Mit stetig steigender Bildschirmauflösung, genauerem Tracking und fallenden Preisen stehen Virtual Reality (VR) Systeme kurz davor sich erfolgreich am Markt zu etablieren. Verschiedene Werkzeuge helfen Entwicklern bei der Erstellung komplexer Interaktionen mit mehreren Benutzern innerhalb adaptiver virtueller Umgebungen. Allerdings entstehen mit der Verbreitung der VR-Systeme auch zusätzliche Herausforderungen: Diverse Eingabegeräte mit ungewohnten Formen und Tastenlayouts verhindern eine intuitive Interaktion. Darüber hinaus zwingt der eingeschränkte Funktionsumfang bestehender Software die Nutzer dazu, auf herkömmliche PC- oder Touch-basierte Systeme zurückzugreifen. Außerdem birgt die Zusammenarbeit mit anderen Anwendern am gleichen Standort Herausforderungen hinsichtlich der Kalibrierung unterschiedlicher Trackingsysteme und der Kollisionsvermeidung. Beim entfernten Zusammenarbeiten wird die Interaktion durch Latenzzeiten und Verbindungsverluste zusätzlich beeinflusst. Schließlich haben die Benutzer unterschiedliche Anforderungen an die Visualisierung von Inhalten, z.B. Größe, Ausrichtung, Farbe oder Kontrast, innerhalb der virtuellen Welten. Eine strikte Nachbildung von realen Umgebungen in VR verschenkt Potential und wird es nicht ermöglichen, die individuellen Bedürfnisse der Benutzer zu berücksichtigen.
Um diese Probleme anzugehen, werden in der vorliegenden Arbeit Lösungen in den Bereichen Eingabe, Zusammenarbeit und Erweiterung von virtuellen Welten und Benutzern vorgestellt, die darauf abzielen, die Benutzerfreundlichkeit und Produktivität von VR zu erhöhen. Zunächst werden PC-basierte Hardware und Software in die virtuelle Welt übertragen, um die Vertrautheit und den Funktionsumfang bestehender Anwendungen in VR zu erhalten. Virtuelle Stellvertreter von physischen Geräten, z.B. Tastatur und Tablet, und ein VR-Modus für Anwendungen ermöglichen es dem Benutzer reale Fähigkeiten in die virtuelle Welt zu übertragen. Des Weiteren wird ein Algorithmus vorgestellt, der die Kalibrierung mehrerer ko-lokaler VR-Geräte mit hoher Genauigkeit und geringen Hardwareanforderungen und geringem Aufwand ermöglicht. Da VR-Headsets die reale Umgebung der Benutzer ausblenden, wird die Relevanz einer Ganzkörper-Avatar-Visualisierung für die Kollisionsvermeidung und das entfernte Zusammenarbeiten nachgewiesen. Darüber hinaus werden personalisierte räumliche oder zeitliche Modifikationen vorgestellt, die es erlauben, die Benutzerfreundlichkeit, Arbeitsleistung und soziale Präsenz von Benutzern zu erhöhen. Diskrepanzen zwischen den virtuellen Welten, die durch persönliche Anpassungen entstehen, werden durch Methoden der Avatar-Umlenkung (engl. redirection) kompensiert. Abschließend werden einige der Methoden und Erkenntnisse in eine beispielhafte Anwendung integriert, um deren praktische Anwendbarkeit zu verdeutlichen.
Die vorliegende Arbeit zeigt, dass virtuelle Umgebungen auf realen Fähigkeiten und Erfahrungen aufbauen können, um eine vertraute und einfache Interaktion und Zusammenarbeit von Benutzern zu gewährleisten. Darüber hinaus ermöglichen individuelle Erweiterungen des virtuellen Inhalts und der Avatare Einschränkungen der realen Welt zu überwinden und das Erlebnis von VR-Umgebungen zu steigern
A survey of real-time crowd rendering
In this survey we review, classify and compare existing approaches for real-time crowd rendering. We first overview character animation techniques, as they are highly tied to crowd rendering performance, and then we analyze the state of the art in crowd rendering. We discuss different representations for level-of-detail (LoD) rendering of animated characters, including polygon-based, point-based, and image-based techniques, and review different criteria for runtime LoD selection. Besides LoD approaches, we review classic acceleration schemes, such as frustum culling and occlusion culling, and describe how they can be adapted to handle crowds of animated characters. We also discuss specific acceleration techniques for crowd rendering, such as primitive pseudo-instancing, palette skinning, and dynamic key-pose caching, which benefit from current graphics hardware. We also address other factors affecting performance and realism of crowds such as lighting, shadowing, clothing and variability. Finally we provide an exhaustive comparison of the most relevant approaches in the field.Peer ReviewedPostprint (author's final draft
Metaverse for Wireless Systems: Architecture, Advances, Standardization, and Open Challenges
The growing landscape of emerging wireless applications is a key driver
toward the development of novel wireless system designs. Such a design can be
based on the metaverse that uses a virtual model of the physical world systems
along with other schemes/technologies (e.g., optimization theory, machine
learning, and blockchain). A metaverse using a virtual model performs proactive
intelligent analytics prior to a user request for efficient management of the
wireless system resources. Additionally, a metaverse will enable
self-sustainability to operate wireless systems with the least possible
intervention from network operators. Although the metaverse can offer many
benefits, it faces some challenges as well. Therefore, in this tutorial, we
discuss the role of a metaverse in enabling wireless applications. We present
an overview, key enablers, design aspects (i.e., metaverse for wireless and
wireless for metaverse), and a novel high-level architecture of metaverse-based
wireless systems. We discuss metaverse management, reliability, and security of
the metaverse-based system. Furthermore, we discuss recent advances and
standardization of metaverse-enabled wireless system. Finally, we outline open
challenges and present possible solutions
Inferring Implicit 3D Representations from Human Figures on Pictorial Maps
In this work, we present an automated workflow to bring human figures, one of
the most frequently appearing entities on pictorial maps, to the third
dimension. Our workflow is based on training data and neural networks for
single-view 3D reconstruction of real humans from photos. We first let a
network consisting of fully connected layers estimate the depth coordinate of
2D pose points. The gained 3D pose points are inputted together with 2D masks
of body parts into a deep implicit surface network to infer 3D signed distance
fields (SDFs). By assembling all body parts, we derive 2D depth images and body
part masks of the whole figure for different views, which are fed into a fully
convolutional network to predict UV images. These UV images and the texture for
the given perspective are inserted into a generative network to inpaint the
textures for the other views. The textures are enhanced by a cartoonization
network and facial details are resynthesized by an autoencoder. Finally, the
generated textures are assigned to the inferred body parts in a ray marcher. We
test our workflow with 12 pictorial human figures after having validated
several network configurations. The created 3D models look generally promising,
especially when considering the challenges of silhouette-based 3D recovery and
real-time rendering of the implicit SDFs. Further improvement is needed to
reduce gaps between the body parts and to add pictorial details to the
textures. Overall, the constructed figures may be used for animation and
storytelling in digital 3D maps.Comment: to be published in 'Cartography and Geographic Information Science
Instant Volumetric Head Avatars
We present Instant Volumetric Head Avatars (INSTA), a novel approach for
reconstructing photo-realistic digital avatars instantaneously. INSTA models a
dynamic neural radiance field based on neural graphics primitives embedded
around a parametric face model. Our pipeline is trained on a single monocular
RGB portrait video that observes the subject under different expressions and
views. While state-of-the-art methods take up to several days to train an
avatar, our method can reconstruct a digital avatar in less than 10 minutes on
modern GPU hardware, which is orders of magnitude faster than previous
solutions. In addition, it allows for the interactive rendering of novel poses
and expressions. By leveraging the geometry prior of the underlying parametric
face model, we demonstrate that INSTA extrapolates to unseen poses. In
quantitative and qualitative studies on various subjects, INSTA outperforms
state-of-the-art methods regarding rendering quality and training time.Comment: Website: https://zielon.github.io/insta/ Video:
https://youtu.be/HOgaeWTih7
Theory of the Avatar
The internet has given birth to an expanding number of shared virtual reality spaces, with a collective population well into the millions. These virtual worlds exhibit most of the traits we associate with the Earth world: economic transactions, interpersonal relationships, organic political institutions, and so on. A human being experiences these worlds through an avatar, which is the representation of the self in a given physical medium. Most worlds allow an agent to choose what kind of avatar she or he will inhabit, allowing a person with any kind of Earth body to inhabit a completely different body in the virtual world. The emergence of avatar-mediated living raises both positive and normative questions. This paper explores several choice models involving avatars. Analysis of these models suggests that the emergence of avatar-mediated life may increase aggregate human well-being, while decreasing its cross-sectional variance. These efficiency and equity effects are contingent on the maintenance and protection of certain rights, however, including the right of agents to free movement, unbiased information, and political participation.information and internet services, computer software, equity, justice, inequality, synthetic worlds
- …