Search CORE

610 research outputs found

Data-driven techniques for animating virtual characters

Author: Mousas Christos
Publication venue
Publication date: 01/01/2015
Field of study

One of the key goals of current research in data-driven computer animation is the synthesis of new motion sequences from existing motion data. This thesis presents three novel techniques for synthesising the motion of a virtual character from existing motion data and develops a framework of solutions to key character animation problems. The first motion synthesis technique presented is based on the character’s locomotion composition process. This technique examines the ability of synthesising a variety of character’s locomotion behaviours while easily specified constraints (footprints) are placed in the three-dimensional space. This is achieved by analysing existing motion data, and by assigning the locomotion behaviour transition process to transition graphs that are responsible for providing information about this process. However, virtual characters should also be able to animate according to different style variations. Therefore, a second technique to synthesise real-time style variations of character’s motion. A novel technique is developed that uses correlation between two different motion styles, and by assigning the motion synthesis process to a parameterised maximum a posteriori (MAP) framework retrieves the desire style content of the input motion in real-time, enhancing the realism of the new synthesised motion sequence. The third technique presents the ability to synthesise the motion of the character’s fingers either o↵-line or in real-time during the performance capture process. The advantage of both techniques is their ability to assign the motion searching process to motion features. The presented technique is able to estimate and synthesise a valid motion of the character’s fingers, enhancing the realism of the input motion. To conclude, this thesis demonstrates that these three novel techniques combine in to a framework that enables the realistic synthesis of virtual character movements, eliminating the post processing, as well as enabling fast synthesis of the required motion

Sussex Research Online

AQ-GT: a Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis

Author: Kopp Stefan
Voß Hendric
Publication venue
Publication date: 02/05/2023
Field of study

The generation of realistic and contextually relevant co-speech gestures is a challenging yet increasingly important task in the creation of multimodal artificial agents. Prior methods focused on learning a direct correspondence between co-speech gesture representations and produced motions, which created seemingly natural but often unconvincing gestures during human assessment. We present an approach to pre-train partial gesture sequences using a generative adversarial network with a quantization pipeline. The resulting codebook vectors serve as both input and output in our framework, forming the basis for the generation and reconstruction of gestures. By learning the mapping of a latent space representation as opposed to directly mapping it to a vector representation, this framework facilitates the generation of highly realistic and expressive gestures that closely replicate human movement and behavior, while simultaneously avoiding artifacts in the generation process. We evaluate our approach by comparing it with established methods for generating co-speech gestures as well as with existing datasets of human behavior. We also perform an ablation study to assess our findings. The results show that our approach outperforms the current state of the art by a clear margin and is partially indistinguishable from human gesturing. We make our data pipeline and the generation framework publicly available

arXiv.org e-Print Archive

Learning Speech-driven 3D Conversational Gestures from Video

Author: Elgharib Mohamed
Habibie Ikhsanul
Liu Lingjie
Mehta Dushyant
Pons-Moll Gerard
Seidel Hans-Peter
Theobalt Christian
Xu Weipeng
Publication venue
Publication date: 01/01/2021
Field of study

We propose the first approach to automatically and jointly synthesize both the synchronous 3D conversational body and hand gestures, as well as 3D face and head animations, of a virtual character from speech input. Our algorithm uses a CNN architecture that leverages the inherent correlation between facial expression and hand gestures. Synthesis of conversational body gestures is a multi-modal problem since many similar gestures can plausibly accompany the same input speech. To synthesize plausible body gestures in this setting, we train a Generative Adversarial Network (GAN) based model that measures the plausibility of the generated sequences of 3D body motion when paired with the input audio features. We also contribute a new way to create a large corpus of more than 33 hours of annotated body, hand, and face data from in-the-wild videos of talking people. To this end, we apply state-of-the-art monocular approaches for 3D body and hand pose estimation as well as dense 3D face performance capture to the video corpus. In this way, we can train on orders of magnitude more data than previous algorithms that resort to complex in-studio motion capture solutions, and thereby train more expressive synthesis algorithms. Our experiments and user study show the state-of-the-art quality of our speech-synthesized full 3D character animations

arXiv.org e-Print Archive

MPG.PuRe

Learning Speech-driven {3D} Conversational Gestures from Video

Author: Elgharib M.
Habibie I.
Liu L.
Mehta D.
Pons-Moll G.
Seidel H.
Theobalt C.
Xu W.
Publication venue
Publication date: 01/01/2021
Field of study

MPG.PuRe

Real-Time Virtual Humans

Author: Badler Norman I
Publication venue: ScholarlyCommons
Publication date: 01/06/1997
Field of study

The last few years have seen great maturation in the computation speed and control methods needed to portray 30 virtual humans suitable for real interactive applications. We first describe the state of the art, then focus on the particular approach taken at the University of Pennsylvania with the Jack system. Various aspects of real-time virtual humans are considered, such as appearance and motion, interactive control, autonomous action, gesture, attention, locomotion, and multiple individuals. The underlying architecture consists of a sense-control-act structure that permits reactive behaviors to be locally adaptive to the environment, and a PaT-Net parallel finite-state machine controller that can be used to drive virtual humans through complex tasks. We then argue for a deep connection between language and animation and describe current efforts in linking them through two systems: the Jack Presenter and the JackMOO extension to lambdaM00. Finally, we outline a Parameterized Action Representation for mediating between language instructions and animated actions

ScholarlyCommons@Penn

How important are detailed hand motions for communication for a virtual character through the lens of charades?

Author: Adkins Alexandra
Di Luca Max
Joerg Sophie
Lin Lorraine
Normoyle Aline
Sun Yu
Ye Yuting
Publication venue
Publication date: 10/11/2022
Field of study

University of Birmingham Research Portal

A Comprehensive Review of Data-Driven Co-Speech Gesture Generation

Author: Ahuja Chaitanya
Henter Gustav Eje
Kucherenko Taras
Neff Michael
Nyatsanga Simbarashe
Publication venue: 'Wiley'
Publication date: 10/04/2023
Field of study

Gestures that accompany speech are an essential part of natural and efficient embodied human communication. The automatic generation of such co-speech gestures is a long-standing problem in computer animation and is considered an enabling technology in film, games, virtual social spaces, and for interaction with social robots. The problem is made challenging by the idiosyncratic and non-periodic nature of human co-speech gesture motion, and by the great diversity of communicative functions that gestures encompass. Gesture generation has seen surging interest recently, owing to the emergence of more and larger datasets of human gesture motion, combined with strides in deep-learning-based generative models, that benefit from the growing availability of data. This review article summarizes co-speech gesture generation research, with a particular focus on deep generative models. First, we articulate the theory describing human gesticulation and how it complements speech. Next, we briefly discuss rule-based and classical statistical gesture synthesis, before delving into deep learning approaches. We employ the choice of input modalities as an organizing principle, examining systems that generate gestures from audio, text, and non-linguistic input. We also chronicle the evolution of the related training data sets in terms of size, diversity, motion quality, and collection method. Finally, we identify key research challenges in gesture generation, including data availability and quality; producing human-like motion; grounding the gesture in the co-occurring speech in interaction with other speakers, and in the environment; performing gesture evaluation; and integration of gesture synthesis into applications. We highlight recent approaches to tackling the various key challenges, as well as the limitations of these approaches, and point toward areas of future development.Comment: Accepted for EUROGRAPHICS 202

arXiv.org e-Print Archive