55,779 research outputs found
Analyzing Input and Output Representations for Speech-Driven Gesture Generation
This paper presents a novel framework for automatic speech-driven gesture
generation, applicable to human-agent interaction including both virtual agents
and robots. Specifically, we extend recent deep-learning-based, data-driven
methods for speech-driven gesture generation by incorporating representation
learning. Our model takes speech as input and produces gestures as output, in
the form of a sequence of 3D coordinates. Our approach consists of two steps.
First, we learn a lower-dimensional representation of human motion using a
denoising autoencoder neural network, consisting of a motion encoder MotionE
and a motion decoder MotionD. The learned representation preserves the most
important aspects of the human pose variation while removing less relevant
variation. Second, we train a novel encoder network SpeechE to map from speech
to a corresponding motion representation with reduced dimensionality. At test
time, the speech encoder and the motion decoder networks are combined: SpeechE
predicts motion representations based on a given speech signal and MotionD then
decodes these representations to produce motion sequences. We evaluate
different representation sizes in order to find the most effective
dimensionality for the representation. We also evaluate the effects of using
different speech features as input to the model. We find that mel-frequency
cepstral coefficients (MFCCs), alone or combined with prosodic features,
perform the best. The results of a subsequent user study confirm the benefits
of the representation learning.Comment: Accepted at IVA '19. Shorter version published at AAMAS '19. The code
is available at
https://github.com/GestureGeneration/Speech_driven_gesture_generation_with_autoencode
Robust Temporally Coherent Laplacian Protrusion Segmentation of 3D Articulated Bodies
In motion analysis and understanding it is important to be able to fit a
suitable model or structure to the temporal series of observed data, in order
to describe motion patterns in a compact way, and to discriminate between them.
In an unsupervised context, i.e., no prior model of the moving object(s) is
available, such a structure has to be learned from the data in a bottom-up
fashion. In recent times, volumetric approaches in which the motion is captured
from a number of cameras and a voxel-set representation of the body is built
from the camera views, have gained ground due to attractive features such as
inherent view-invariance and robustness to occlusions. Automatic, unsupervised
segmentation of moving bodies along entire sequences, in a temporally-coherent
and robust way, has the potential to provide a means of constructing a
bottom-up model of the moving body, and track motion cues that may be later
exploited for motion classification. Spectral methods such as locally linear
embedding (LLE) can be useful in this context, as they preserve "protrusions",
i.e., high-curvature regions of the 3D volume, of articulated shapes, while
improving their separation in a lower dimensional space, making them in this
way easier to cluster. In this paper we therefore propose a spectral approach
to unsupervised and temporally-coherent body-protrusion segmentation along time
sequences. Volumetric shapes are clustered in an embedding space, clusters are
propagated in time to ensure coherence, and merged or split to accommodate
changes in the body's topology. Experiments on both synthetic and real
sequences of dense voxel-set data are shown. This supports the ability of the
proposed method to cluster body-parts consistently over time in a totally
unsupervised fashion, its robustness to sampling density and shape quality, and
its potential for bottom-up model constructionComment: 31 pages, 26 figure
Real Time Animation of Virtual Humans: A Trade-off Between Naturalness and Control
Virtual humans are employed in many interactive applications using 3D virtual environments, including (serious) games. The motion of such virtual humans should look realistic (or ânaturalâ) and allow interaction with the surroundings and other (virtual) humans. Current animation techniques differ in the trade-off they offer between motion naturalness and the control that can be exerted over the motion. We show mechanisms to parametrize, combine (on different body parts) and concatenate motions generated by different animation techniques. We discuss several aspects of motion naturalness and show how it can be evaluated. We conclude by showing the promise of combinations of different animation paradigms to enhance both naturalness and control
Modeling variation of human motion
The synthesis of realistic human motion with large variations and different styles has a growing interest in simulation applications such as the game industry, psychological experiments, and ergonomic analysis. The statistical generative models are used by motion controllers in our motion synthesis framework to create new animations for different scenarios. Data-driven motion synthesis approaches are powerful tools for producing high-fidelity character animations. With the development of motion capture technologies, more and more motion data are publicly available now. However, how to efficiently reuse a large amount of motion data to create new motions for arbitrary scenarios poses challenges, especially for unsupervised motion synthesis. This thesis presents a series of works that analyze and model the variations of human motion data. The goal is to learn statistical generative models to create any number of new human animations with rich variations and styles. The work of the thesis will be presented in three main chapters. We first explore how variation is represented in motion data. Learning a compact latent space that can expressively contain motion variation is essential for modeling motion data. We propose a novel motion latent space learning approach that can intrinsically tackle the spatialtemporal properties of motion data. Secondly, we present our Morphable Graph framework for human motion modeling and synthesis for assembly workshop scenarios. A series of studies have been conducted to apply statistical motion modeling and synthesis approaches for complex assembly workshop use cases. Learning the distribution of motion data can provide a compact representation of motion variations and convert motion synthesis tasks to optimization problems. Finally, we show how the style variations of human activities can be modeled with a limited number of examples. Natural human movements display a rich repertoire of styles and personalities. However, it is difficult to get enough examples for data-driven approaches. We propose a conditional variational autoencoder (CVAE) to combine large variations in the neutral motion database and style information from a limited number of examples.Die Synthese realistischer menschlicher Bewegungen mit groĂen Variationen und unterschiedlichen Stilen ist fĂŒr Simulationsanwendungen wie die Spieleindustrie, psychologische Experimente und ergonomische Analysen von wachsendem Interesse. Datengetriebene BewegungssyntheseansĂ€tze sind leistungsstarke Werkzeuge fĂŒr die Erstellung realitĂ€tsgetreuer Charakteranimationen. Mit der Entwicklung von Motion-Capture-Technologien sind nun immer mehr Motion-Daten öffentlich verfĂŒgbar. Die effiziente Wiederverwendung einer groĂen Menge von Motion-Daten zur Erstellung neuer Bewegungen fĂŒr beliebige Szenarien stellt jedoch eine Herausforderung dar, insbesondere fĂŒr die unĂŒberwachte Bewegungssynthesemethoden. Das Lernen der Verteilung von Motion-Daten kann eine kompakte ReprĂ€sentation von Bewegungsvariationen liefern und Bewegungssyntheseaufgaben in Optimierungsprobleme umwandeln. In dieser Dissertation werden eine Reihe von Arbeiten vorgestellt, die die Variationen menschlicher Bewegungsdaten analysieren und modellieren. Das Ziel ist es, statistische generative Modelle zu erlernen, um eine beliebige Anzahl neuer menschlicher Animationen mit reichen Variationen und Stilen zu erstellen. In unserem Bewegungssynthese-Framework werden die statistischen generativen Modelle von Bewegungscontrollern verwendet, um neue Animationen fĂŒr verschiedene Szenarien zu erstellen. Die Arbeit in dieser Dissertation wird in drei Hauptkapiteln vorgestellt. Wir untersuchen zunĂ€chst, wie Variation in Bewegungsdaten dargestellt wird. Das Erlernen eines kompakten latenten Raums, der Bewegungsvariationen ausdrucksvoll enthalten kann, ist fĂŒr die Modellierung von Bewegungsdaten unerlĂ€sslich. Wir schlagen einen neuartigen Ansatz zum Lernen des latenten Bewegungsraums vor, der die rĂ€umlich-zeitlichen Eigenschaften von Bewegungsdaten intrinsisch angehen kann. Zweitens stellen wir unser Morphable Graph Framework fĂŒr die menschliche Bewegungsmodellierung und -synthese fĂŒr Montage-Workshop- Szenarien vor. Es wurde eine Reihe von Studien durchgefĂŒhrt, um statistische Bewegungsmodellierungs und syntheseansĂ€tze fĂŒr komplexe AnwendungsfĂ€lle in MontagewerkstĂ€tten anzuwenden. SchlieĂlich zeigen wir anhand einer begrenzten Anzahl von Beispielen, wie die Stilvariationen menschlicher AktivitĂ€ten modelliertwerden können. NatĂŒrliche menschliche Bewegungen weisen ein reiches Repertoire an Stilen und Persönlichkeiten auf. Es ist jedoch schwierig, genĂŒgend Beispiele fĂŒr datengetriebene AnsĂ€tze zu erhalten. Wir schlagen einen Conditional Variational Autoencoder (CVAE) vor, um groĂe Variationen in der neutralen Bewegungsdatenbank und Stilinformationen aus einer begrenzten Anzahl von Beispielen zu kombinieren. Wir zeigen, dass unser Ansatz eine beliebige Anzahl von natĂŒrlich aussehenden Variationen menschlicher Bewegungen mit einem Ă€hnlichen Stil wie das Ziel erzeugen kann
- âŠ