38 research outputs found
From motion capture to interactive virtual worlds : towards unconstrained motion-capture algorithms for real-time performance-driven character animation
This dissertation takes performance-driven character animation as a representative application and advances motion capture algorithms and animation methods to meet its high demands. Existing approaches have either coarse resolution and restricted capture volume, require expensive and complex multi-camera systems, or use intrusive suits and controllers.
For motion capture, set-up time is reduced using fewer cameras, accuracy is increased despite occlusions and general environments, initialization is automated, and free roaming is enabled by egocentric cameras. For animation, increased robustness enables the use of low-cost sensors input, custom control gesture definition is guided to support novice users, and animation expressiveness is increased. The important contributions are: 1) an analytic and differentiable visibility model for pose optimization under strong occlusions, 2) a volumetric contour model for automatic actor initialization in general scenes, 3) a method to annotate and augment image-pose databases automatically, 4) the utilization of unlabeled examples for character control, and 5) the generalization and disambiguation of cyclical gestures for faithful character animation.
In summary, the whole process of human motion capture, processing, and application to animation is advanced. These advances on the state of the art have the potential to improve many interactive applications, within and outside virtual reality.Diese Arbeit befasst sich mit Performance-driven Character Animation, insbesondere werden Motion Capture-Algorithmen entwickelt um den hohen Anforderungen dieser Beispielanwendung gerecht zu werden. Existierende Methoden haben entweder eine geringe Genauigkeit und einen eingeschränkten Aufnahmebereich oder benötigen teure Multi-Kamera-Systeme, oder benutzen störende Controller und spezielle Anzüge.
Für Motion Capture wird die Setup-Zeit verkürzt, die Genauigkeit für Verdeckungen und generelle Umgebungen erhöht, die Initialisierung automatisiert, und Bewegungseinschränkung verringert. Für Character Animation wird die Robustheit für ungenaue Sensoren erhöht, Hilfe für benutzerdefinierte Gestendefinition geboten, und die Ausdrucksstärke der Animation verbessert. Die wichtigsten Beiträge sind: 1) ein analytisches und differenzierbares Sichtbarkeitsmodell für Rekonstruktionen unter starken Verdeckungen, 2) ein volumetrisches Konturenmodell für automatische Körpermodellinitialisierung in genereller Umgebung, 3) eine Methode zur automatischen Annotation von Posen und Augmentation von Bildern in großen Datenbanken, 4) das Nutzen von Beispielbewegungen für Character Animation, und 5) die Generalisierung und Übertragung von zyklischen Gesten für genaue Charakteranimation.
Es wird der gesamte Prozess erweitert, von Motion Capture bis hin zu Charakteranimation. Die Verbesserungen sind für viele interaktive Anwendungen geeignet, innerhalb und außerhalb von virtueller Realität
From motion capture to interactive virtual worlds : towards unconstrained motion-capture algorithms for real-time performance-driven character animation
This dissertation takes performance-driven character animation as a representative application and advances motion capture algorithms and animation methods to meet its high demands. Existing approaches have either coarse resolution and restricted capture volume, require expensive and complex multi-camera systems, or use intrusive suits and controllers.
For motion capture, set-up time is reduced using fewer cameras, accuracy is increased despite occlusions and general environments, initialization is automated, and free roaming is enabled by egocentric cameras. For animation, increased robustness enables the use of low-cost sensors input, custom control gesture definition is guided to support novice users, and animation expressiveness is increased. The important contributions are: 1) an analytic and differentiable visibility model for pose optimization under strong occlusions, 2) a volumetric contour model for automatic actor initialization in general scenes, 3) a method to annotate and augment image-pose databases automatically, 4) the utilization of unlabeled examples for character control, and 5) the generalization and disambiguation of cyclical gestures for faithful character animation.
In summary, the whole process of human motion capture, processing, and application to animation is advanced. These advances on the state of the art have the potential to improve many interactive applications, within and outside virtual reality.Diese Arbeit befasst sich mit Performance-driven Character Animation, insbesondere werden Motion Capture-Algorithmen entwickelt um den hohen Anforderungen dieser Beispielanwendung gerecht zu werden. Existierende Methoden haben entweder eine geringe Genauigkeit und einen eingeschränkten Aufnahmebereich oder benötigen teure Multi-Kamera-Systeme, oder benutzen störende Controller und spezielle Anzüge.
Für Motion Capture wird die Setup-Zeit verkürzt, die Genauigkeit für Verdeckungen und generelle Umgebungen erhöht, die Initialisierung automatisiert, und Bewegungseinschränkung verringert. Für Character Animation wird die Robustheit für ungenaue Sensoren erhöht, Hilfe für benutzerdefinierte Gestendefinition geboten, und die Ausdrucksstärke der Animation verbessert. Die wichtigsten Beiträge sind: 1) ein analytisches und differenzierbares Sichtbarkeitsmodell für Rekonstruktionen unter starken Verdeckungen, 2) ein volumetrisches Konturenmodell für automatische Körpermodellinitialisierung in genereller Umgebung, 3) eine Methode zur automatischen Annotation von Posen und Augmentation von Bildern in großen Datenbanken, 4) das Nutzen von Beispielbewegungen für Character Animation, und 5) die Generalisierung und Übertragung von zyklischen Gesten für genaue Charakteranimation.
Es wird der gesamte Prozess erweitert, von Motion Capture bis hin zu Charakteranimation. Die Verbesserungen sind für viele interaktive Anwendungen geeignet, innerhalb und außerhalb von virtueller Realität
Emotional Speech-Driven Animation with Content-Emotion Disentanglement
To be widely adopted, 3D facial avatars must be animated easily,
realistically, and directly from speech signals. While the best recent methods
generate 3D animations that are synchronized with the input audio, they largely
ignore the impact of emotions on facial expressions. Realistic facial animation
requires lip-sync together with the natural expression of emotion. To that end,
we propose EMOTE (Expressive Model Optimized for Talking with Emotion), which
generates 3D talking-head avatars that maintain lip-sync from speech while
enabling explicit control over the expression of emotion. To achieve this, we
supervise EMOTE with decoupled losses for speech (i.e., lip-sync) and emotion.
These losses are based on two key observations: (1) deformations of the face
due to speech are spatially localized around the mouth and have high temporal
frequency, whereas (2) facial expressions may deform the whole face and occur
over longer intervals. Thus, we train EMOTE with a per-frame lip-reading loss
to preserve the speech-dependent content, while supervising emotion at the
sequence level. Furthermore, we employ a content-emotion exchange mechanism in
order to supervise different emotions on the same audio, while maintaining the
lip motion synchronized with the speech. To employ deep perceptual losses
without getting undesirable artifacts, we devise a motion prior in the form of
a temporal VAE. Due to the absence of high-quality aligned emotional 3D face
datasets with speech, EMOTE is trained with 3D pseudo-ground-truth extracted
from an emotional video dataset (i.e., MEAD). Extensive qualitative and
perceptual evaluations demonstrate that EMOTE produces speech-driven facial
animations with better lip-sync than state-of-the-art methods trained on the
same data, while offering additional, high-quality emotional control.Comment: SIGGRAPH Asia 2023 Conference Pape
Recommended from our members
Generating 3D product design models in real-time using hand motion and gesture
This thesis was submitted for the degree of Master of Philosophy and awarded by Brunel University.Three dimensional product design models are widely used in conceptual design and in the early stage of prototyping during the design processes. A product design specification often demands a substantial amount of 3D models to be constructed within a short period of time. Current methods begin with designers sketching product concepts in 2D using pencil and paper, which in turn are then translated into 3D models by a design individual with CAD expertise, using a 3D modelling software package such as Pro Engineer, Solid Works, Auto CAD etc. Several novel methods have been used to incorporate hand motion as a way of interacting with computers. There are three main types of technology available to capture motion data, capable of translating human motion into numeric data which can be read by a computer system. The first being, hand gesture glove-based systems such as “Cyberglove”, these systems are generally used to capture hand gesture and joint angle information. The second is full body motion capture systems, optical and non-optical-based, and finally vision based gesture recognition systems which capture full degree of - freedom (DOF) hand motion estimation. There has yet to be a method using any of the above mentioned input devices to rapidly produce 3D product design models in real time, using hand motion and gestures. In this research, a novel method is presented, using a motion capture system to capture hand gestures and motion in real time, to recreate 3D curves and surfaces, which can be translated into 3D product design models. The main aim of this research is to develop a hand motion and gesture-based rapid 3D product modelling method, allowing designers to interactively sketch out 3D concepts in real time using a virtual workspace.
A database of a number of hand signs was built for both architectural hand signs (preliminary study) and Product Design hand signs. A marker set model with a total of eight markers (five on the left hand and three on right hand/marker pen) was designed and used in the capture of hand gestures with the use of an Optical Motion Capture System. A preliminary testing session was successfully completed to determine whether the Motion Capture system would be suitable for a real-time application, by effectively modelling a train station in an offline state using hand motion and gesture. An OpenGL software application was programmed using C++ and the Microsoft Foundation Classes which was used to communicate and pass information of captured motion from the EVaRT system to the user
UnifiedGesture: A Unified Gesture Synthesis Model for Multiple Skeletons
The automatic co-speech gesture generation draws much attention in computer
animation. Previous works designed network structures on individual datasets,
which resulted in a lack of data volume and generalizability across different
motion capture standards. In addition, it is a challenging task due to the weak
correlation between speech and gestures. To address these problems, we present
UnifiedGesture, a novel diffusion model-based speech-driven gesture synthesis
approach, trained on multiple gesture datasets with different skeletons.
Specifically, we first present a retargeting network to learn latent
homeomorphic graphs for different motion capture standards, unifying the
representations of various gestures while extending the dataset. We then
capture the correlation between speech and gestures based on a diffusion model
architecture using cross-local attention and self-attention to generate better
speech-matched and realistic gestures. To further align speech and gesture and
increase diversity, we incorporate reinforcement learning on the discrete
gesture units with a learned reward function. Extensive experiments show that
UnifiedGesture outperforms recent approaches on speech-driven gesture
generation in terms of CCA, FGD, and human-likeness. All code, pre-trained
models, databases, and demos are available to the public at
https://github.com/YoungSeng/UnifiedGesture.Comment: 16 pages, 11 figures, ACM MM 202