4,189 research outputs found
Dynamic Facial Expression of Emotion Made Easy
Facial emotion expression for virtual characters is used in a wide variety of
areas. Often, the primary reason to use emotion expression is not to study
emotion expression generation per se, but to use emotion expression in an
application or research project. What is then needed is an easy to use and
flexible, but also validated mechanism to do so. In this report we present such
a mechanism. It enables developers to build virtual characters with dynamic
affective facial expressions. The mechanism is based on Facial Action Coding.
It is easy to implement, and code is available for download. To show the
validity of the expressions generated with the mechanism we tested the
recognition accuracy for 6 basic emotions (joy, anger, sadness, surprise,
disgust, fear) and 4 blend emotions (enthusiastic, furious, frustrated, and
evil). Additionally we investigated the effect of VC distance (z-coordinate),
the effect of the VC's face morphology (male vs. female), the effect of a
lateral versus a frontal presentation of the expression, and the effect of
intensity of the expression. Participants (n=19, Western and Asian subjects)
rated the intensity of each expression for each condition (within subject
setup) in a non forced choice manner. All of the basic emotions were uniquely
perceived as such. Further, the blends and confusion details of basic emotions
are compatible with findings in psychology
Dynamic Facial Expression Generation on Hilbert Hypersphere with Conditional Wasserstein Generative Adversarial Nets
In this work, we propose a novel approach for generating videos of the six
basic facial expressions given a neutral face image. We propose to exploit the
face geometry by modeling the facial landmarks motion as curves encoded as
points on a hypersphere. By proposing a conditional version of manifold-valued
Wasserstein generative adversarial network (GAN) for motion generation on the
hypersphere, we learn the distribution of facial expression dynamics of
different classes, from which we synthesize new facial expression motions. The
resulting motions can be transformed to sequences of landmarks and then to
images sequences by editing the texture information using another conditional
Generative Adversarial Network. To the best of our knowledge, this is the first
work that explores manifold-valued representations with GAN to address the
problem of dynamic facial expression generation. We evaluate our proposed
approach both quantitatively and qualitatively on two public datasets;
Oulu-CASIA and MUG Facial Expression. Our experimental results demonstrate the
effectiveness of our approach in generating realistic videos with continuous
motion, realistic appearance and identity preservation. We also show the
efficiency of our framework for dynamic facial expressions generation, dynamic
facial expression transfer and data augmentation for training improved emotion
recognition models
CNN-based Real-time Dense Face Reconstruction with Inverse-rendered Photo-realistic Face Images
With the powerfulness of convolution neural networks (CNN), CNN based face
reconstruction has recently shown promising performance in reconstructing
detailed face shape from 2D face images. The success of CNN-based methods
relies on a large number of labeled data. The state-of-the-art synthesizes such
data using a coarse morphable face model, which however has difficulty to
generate detailed photo-realistic images of faces (with wrinkles). This paper
presents a novel face data generation method. Specifically, we render a large
number of photo-realistic face images with different attributes based on
inverse rendering. Furthermore, we construct a fine-detailed face image dataset
by transferring different scales of details from one image to another. We also
construct a large number of video-type adjacent frame pairs by simulating the
distribution of real video data. With these nicely constructed datasets, we
propose a coarse-to-fine learning framework consisting of three convolutional
networks. The networks are trained for real-time detailed 3D face
reconstruction from monocular video as well as from a single image. Extensive
experimental results demonstrate that our framework can produce high-quality
reconstruction but with much less computation time compared to the
state-of-the-art. Moreover, our method is robust to pose, expression and
lighting due to the diversity of data.Comment: Accepted by IEEE Transactions on Pattern Analysis and Machine
Intelligence, 201
The perception of emotion in artificial agents
Given recent technological developments in robotics, artificial intelligence and virtual reality, it is perhaps unsurprising that the arrival of emotionally expressive and reactive artificial agents is imminent. However, if such agents are to become integrated into our social milieu, it is imperative to establish an understanding of whether and how humans perceive emotion in artificial agents. In this review, we incorporate recent findings from social robotics, virtual reality, psychology, and neuroscience to examine how people recognize and respond to emotions displayed by artificial agents. First, we review how people perceive emotions expressed by an artificial agent, such as facial and bodily expressions and vocal tone. Second, we evaluate the similarities and differences in the consequences of perceived emotions in artificial compared to human agents. Besides accurately recognizing the emotional state of an artificial agent, it is critical to understand how humans respond to those emotions. Does interacting with an angry robot induce the same responses in people as interacting with an angry person? Similarly, does watching a robot rejoice when it wins a game elicit similar feelings of elation in the human observer? Here we provide an overview of the current state of emotion expression and perception in social robotics, as well as a clear articulation of the challenges and guiding principles to be addressed as we move ever closer to truly emotional artificial agents
Recommended from our members
The Uncanny Valley Effect
The Uncanny Valley Effect (UVE) first emerged as a warning against making industrial robots appear so highly human-like that they could unsettle the real humans around them. It proposed a specific pattern of negative emotional responses to entities that were almost but not quite human, and has been proposed as the reason why some entities such as dolls, mannequins and zombies may appear unsettling.
The aim of this thesis was to move beyond an anecdotal explanation to understand more about the perception of near-human faces, and how this compares to the perception of human and non-human faces. The aims were to explore the relationship between the human-likeness of faces and emotional responses to them, to understand reactions to and descriptions of near-human faces, to explore aspects of how near-human faces are processed and to explore whether mismatched emotional expressions might contribute to the perception of some near-human faces as eerie.
Five studies were carried out using face images whose human-likeness was systematically controlled or measured. A non-linear relationship between human-likeness and eeriness was found, but the near-human faces were not always the eeriest images. Near-human faces were found to be subject to the effects of inversion, and inversion was found to heighten perceptions of eeriness. Faces were created which contained mismatched emotional expressions, and the blends combining happy faces with angry or fearful eyes were rated as the most eerie. Incongruities between aspects of appearance or behaviour had been cited as explanations for the UVE in the past but this thesis presents the first evidence that differences in eeriness may result from incongruities between emotional expressions. Directions for future research have been suggested to explore these findings in a wider context and to understand more about the UVE
HeadOn: Real-time Reenactment of Human Portrait Videos
We propose HeadOn, the first real-time source-to-target reenactment approach
for complete human portrait videos that enables transfer of torso and head
motion, face expression, and eye gaze. Given a short RGB-D video of the target
actor, we automatically construct a personalized geometry proxy that embeds a
parametric head, eye, and kinematic torso model. A novel real-time reenactment
algorithm employs this proxy to photo-realistically map the captured motion
from the source actor to the target actor. On top of the coarse geometric
proxy, we propose a video-based rendering technique that composites the
modified target portrait video via view- and pose-dependent texturing, and
creates photo-realistic imagery of the target actor under novel torso and head
poses, facial expressions, and gaze directions. To this end, we propose a
robust tracking of the face and torso of the source actor. We extensively
evaluate our approach and show significant improvements in enabling much
greater flexibility in creating realistic reenacted output videos.Comment: Video: https://www.youtube.com/watch?v=7Dg49wv2c_g Presented at
Siggraph'1
Expressive Modulation of Neutral Visual Speech
The need for animated graphical models of the human face is commonplace in
the movies, video games and television industries, appearing in everything from
low budget advertisements and free mobile apps, to Hollywood blockbusters
costing hundreds of millions of dollars. Generative statistical models of
animation attempt to address some of the drawbacks of industry standard
practices such as labour intensity and creative inflexibility.
This work describes one such method for transforming speech animation curves
between different expressive styles. Beginning with the assumption that
expressive speech animation is a mix of two components, a high-frequency
speech component (the content) and a much lower-frequency expressive
component (the style), we use Independent Component Analysis (ICA) to
identify and manipulate these components independently of one another. Next
we learn how the energy for different speaking styles is distributed in terms of
the low-dimensional independent components model. Transforming the
speaking style involves projecting new animation curves into the lowdimensional
ICA space, redistributing the energy in the independent
components, and finally reconstructing the animation curves by inverting the
projection.
We show that a single ICA model can be used for separating multiple expressive
styles into their component parts. Subjective evaluations show that viewers can
reliably identify the expressive style generated using our approach, and that they
have difficulty in identifying transformed animated expressive speech from the
equivalent ground-truth
Artificial Intelligence Tools for Facial Expression Analysis.
Inner emotions show visibly upon the human face and are understood as a basic guide to an individual’s inner world. It is, therefore, possible to determine a person’s attitudes and the effects of others’ behaviour on their deeper feelings through examining facial expressions. In real world applications, machines that interact with people need strong facial expression recognition. This recognition is seen to hold advantages for varied applications in affective computing, advanced human-computer interaction, security, stress and depression analysis, robotic systems, and machine learning. This thesis starts by proposing a benchmark of dynamic versus static methods for facial Action Unit (AU) detection. AU activation is a set of local individual facial muscle parts that occur in unison constituting a natural facial expression event. Detecting AUs automatically can provide explicit benefits since it considers both static and dynamic facial features. For this research, AU occurrence activation detection was conducted by extracting features (static and dynamic) of both nominal hand-crafted and deep learning representation from each static image of a video. This confirmed the superior ability of a pretrained model that leaps in performance. Next, temporal modelling was investigated to detect the underlying temporal variation phases using supervised and unsupervised methods from dynamic sequences. During these processes, the importance of stacking dynamic on top of static was discovered in encoding deep features for learning temporal information when combining the spatial and temporal schemes simultaneously. Also, this study found that fusing both temporal and temporal features will give more long term temporal pattern information. Moreover, we hypothesised that using an unsupervised method would enable the leaching of invariant information from dynamic textures. Recently, fresh cutting-edge developments have been created by approaches based on Generative Adversarial Networks (GANs). In the second section of this thesis, we propose a model based on the adoption of an unsupervised DCGAN for the facial features’ extraction and classification to achieve the following: the creation of facial expression images under different arbitrary poses (frontal, multi-view, and in the wild), and the recognition of emotion categories and AUs, in an attempt to resolve the problem of recognising the static seven classes of emotion in the wild. Thorough experimentation with the proposed cross-database performance demonstrates that this approach can improve the generalization results. Additionally, we showed that the features learnt by the DCGAN process are poorly suited to encoding facial expressions when observed under multiple views, or when trained from a limited number of positive examples. Finally, this research focuses on disentangling identity from expression for facial expression recognition. A novel technique was implemented for emotion recognition from a single monocular image. A large-scale dataset (Face vid) was created from facial image videos which were rich in variations and distribution of facial dynamics, appearance, identities, expressions, and 3D poses. This dataset was used to train a DCNN (ResNet) to regress the expression parameters from a 3D Morphable Model jointly with a back-end classifier
An Actor-Centric Approach to Facial Animation Control by Neural Networks For Non-Player Characters in Video Games
Game developers increasingly consider the degree to which character animation emulates facial expressions found in cinema. Employing animators and actors to produce cinematic facial animation by mixing motion capture and hand-crafted animation is labor intensive and therefore expensive. Emotion corpora and neural network controllers have shown promise toward developing autonomous animation that does not rely on motion capture. Previous research and practice in disciplines of Computer Science, Psychology and the Performing Arts have provided frameworks on which to build a workflow toward creating an emotion AI system that can animate the facial mesh of a 3d non-player character deploying a combination of related theories and methods. However, past investigations and their resulting production methods largely ignore the emotion generation systems that have evolved in the performing arts for more than a century. We find very little research that embraces the intellectual process of trained actors as complex collaborators from which to understand and model the training of a neural network for character animation. This investigation demonstrates a workflow design that integrates knowledge from the performing arts and the affective branches of the social and biological sciences. Our workflow begins at the stage of developing and annotating a fictional scenario with actors, to producing a video emotion corpus, to designing training and validating a neural network, to analyzing the emotion data annotation of the corpus and neural network, and finally to determining resemblant behavior of its autonomous animation control of a 3d character facial mesh. The resulting workflow includes a method for the development of a neural network architecture whose initial efficacy as a facial emotion expression simulator has been tested and validated as substantially resemblant to the character behavior developed by a human actor
A Review of Dynamic Datasets for Facial Expression Research
Temporal dynamics have been increasingly recognized as an important component of facial expressions. With the need for appropriate stimuli in research and application, a range of databases of dynamic facial stimuli has been developed. The present article reviews the existing corpora and describes the key dimensions and properties of the available sets. This includes a discussion of conceptual features in terms of thematic issues in dataset construction as well as practical features which are of applied interest to stimulus usage. To identify the most influential sets, we further examine their citation rates and usage frequencies in existing studies. General limitations and implications for emotion research are noted and future directions for stimulus generation are outlined
- …