88,093 research outputs found
Head Motion Analysis and Synthesis over Different Tasks
Abstract. It is known that subjects vary in their head movements. This paper presents an analysis of this variety over different tasks and speakers and their impact on head motion synthesis. Measured head and articulatory movements acquired by an ElectroMagnetic Articulograph (EMA) synchronously recorded with audio was used. Data set of speech of 12 people recorded on different tasks confirms that the head motion variate over tasks and speakers. Experimental results confirmed that the proposed models were capable of learning and synthesising task-dependent head motions from speech. Subjective evaluation of synthesised head motion using task models shows that trained models on the matched task is better than mismatched one and free speech data provide models that predict preferred motion by the participants compared to read speech data
Articulatory features for speech-driven head motion synthesis
This study investigates the use of articulatory features for speech-driven head motion synthesis as opposed to prosody features such as F0 and energy that have been mainly used in the literature. In the proposed approach, multi-stream HMMs are trained jointly on the synchronous streams of speech and head motion data. Articulatory features can be regarded as an intermediate parametrisation of speech that are expected to have a close link with head movement. Measured head and articulatory movements acquired by EMA were synchronously recorded with speech. Measured articulatory data was compared to those predicted from speech using an HMM-based inversion mapping system trained in a semi-supervised fashion. Canonical correlation analysis (CCA) on a data set of free speech of 12 people shows that the articulatory features are more correlated with head rotation than prosodic and/or cepstral speech features. It is also shown that the synthesised head motion using articulatory features gave higher correlations with the original head motion than when only prosodic features are used. Index Terms: head motion synthesis, articulatory features, canonical correlation analysis, acoustic-to-articulatory mappin
Human motion modeling and simulation by anatomical approach
To instantly generate desired infinite realistic human motion is still a great challenge in virtual human simulation. In this paper, the novel emotion effected motion classification and anatomical motion classification are presented, as well as motion capture and parameterization methods. The framework for a novel anatomical approach to model human motion in a HTR (Hierarchical Translations and Rotations) file format is also described. This novel anatomical approach in human motion modelling has the potential to generate desired infinite human motion from a compact motion database. An architecture for the real-time generation of new motions is also propose
Speech-driven Animation with Meaningful Behaviors
Conversational agents (CAs) play an important role in human computer
interaction. Creating believable movements for CAs is challenging, since the
movements have to be meaningful and natural, reflecting the coupling between
gestures and speech. Studies in the past have mainly relied on rule-based or
data-driven approaches. Rule-based methods focus on creating meaningful
behaviors conveying the underlying message, but the gestures cannot be easily
synchronized with speech. Data-driven approaches, especially speech-driven
models, can capture the relationship between speech and gestures. However, they
create behaviors disregarding the meaning of the message. This study proposes
to bridge the gap between these two approaches overcoming their limitations.
The approach builds a dynamic Bayesian network (DBN), where a discrete variable
is added to constrain the behaviors on the underlying constraint. The study
implements and evaluates the approach with two constraints: discourse functions
and prototypical behaviors. By constraining on the discourse functions (e.g.,
questions), the model learns the characteristic behaviors associated with a
given discourse class learning the rules from the data. By constraining on
prototypical behaviors (e.g., head nods), the approach can be embedded in a
rule-based system as a behavior realizer creating trajectories that are timely
synchronized with speech. The study proposes a DBN structure and a training
approach that (1) models the cause-effect relationship between the constraint
and the gestures, (2) initializes the state configuration models increasing the
range of the generated behaviors, and (3) captures the differences in the
behaviors across constraints by enforcing sparse transitions between shared and
exclusive states per constraint. Objective and subjective evaluations
demonstrate the benefits of the proposed approach over an unconstrained model.Comment: 13 pages, 12 figures, 5 table
MedGAN: Medical Image Translation using GANs
Image-to-image translation is considered a new frontier in the field of
medical image analysis, with numerous potential applications. However, a large
portion of recent approaches offers individualized solutions based on
specialized task-specific architectures or require refinement through
non-end-to-end training. In this paper, we propose a new framework, named
MedGAN, for medical image-to-image translation which operates on the image
level in an end-to-end manner. MedGAN builds upon recent advances in the field
of generative adversarial networks (GANs) by merging the adversarial framework
with a new combination of non-adversarial losses. We utilize a discriminator
network as a trainable feature extractor which penalizes the discrepancy
between the translated medical images and the desired modalities. Moreover,
style-transfer losses are utilized to match the textures and fine-structures of
the desired target images to the translated images. Additionally, we present a
new generator architecture, titled CasNet, which enhances the sharpness of the
translated medical outputs through progressive refinement via encoder-decoder
pairs. Without any application-specific modifications, we apply MedGAN on three
different tasks: PET-CT translation, correction of MR motion artefacts and PET
image denoising. Perceptual analysis by radiologists and quantitative
evaluations illustrate that the MedGAN outperforms other existing translation
approaches.Comment: 16 pages, 8 figure
Exploiting the robot kinematic redundancy for emotion conveyance to humans as a lower priority task
Current approaches do not allow robots to execute a task and simultaneously convey emotions to users using their body motions. This paper explores the capabilities of the Jacobian null space of a humanoid robot to convey emotions. A task priority formulation has been implemented in a Pepper robot which allows the specification of a primary task (waving gesture, transportation of an object, etc.) and exploits the kinematic redundancy of the robot to convey emotions to humans as a lower priority task. The emotions, defined by Mehrabian as points in the pleasure–arousal–dominance space, generate intermediate motion features (jerkiness, activity and gaze) that carry the emotional information. A map from this features to the joints of the robot is presented. A user study has been conducted in which emotional motions have been shown to 30 participants. The results show that happiness and sadness are very well conveyed to the user, calm is moderately well conveyed, and fear is not well conveyed. An analysis on the dependencies between the motion features and the emotions perceived by the participants shows that activity correlates positively with arousal, jerkiness is not perceived by the user, and gaze conveys dominance when activity is low. The results indicate a strong influence of the most energetic motions of the emotional task and point out new directions for further research. Overall, the results show that the null space approach can be regarded as a promising mean to convey emotions as a lower priority task.Postprint (author's final draft
Robust visual servoing in 3d reaching tasks
This paper describes a novel approach to the problem of reaching an object in space under visual guidance. The approach is characterized by a great robustness to calibration errors, such that virtually no calibration is required. Servoing is based on binocular vision: a continuous measure of the end-effector motion field, derived from real-time computation of the binocular optical flow over the stereo images, is compared with the actual position of the target and the relative error in the end-effector trajectory is continuously corrected. The paper outlines the general framework of the approach, shows how visual measures are obtained and discusses the synthesis of the controller along with its stability analysis. Real-time experiments are presented to show the applicability of the approach in real 3-D applications
- …