28,415 research outputs found
Every Smile is Unique: Landmark-Guided Diverse Smile Generation
Each smile is unique: one person surely smiles in different ways (e.g.,
closing/opening the eyes or mouth). Given one input image of a neutral face,
can we generate multiple smile videos with distinctive characteristics? To
tackle this one-to-many video generation problem, we propose a novel deep
learning architecture named Conditional Multi-Mode Network (CMM-Net). To better
encode the dynamics of facial expressions, CMM-Net explicitly exploits facial
landmarks for generating smile sequences. Specifically, a variational
auto-encoder is used to learn a facial landmark embedding. This single
embedding is then exploited by a conditional recurrent network which generates
a landmark embedding sequence conditioned on a specific expression (e.g.,
spontaneous smile). Next, the generated landmark embeddings are fed into a
multi-mode recurrent landmark generator, producing a set of landmark sequences
still associated to the given smile class but clearly distinct from each other.
Finally, these landmark sequences are translated into face videos. Our
experimental results demonstrate the effectiveness of our CMM-Net in generating
realistic videos of multiple smile expressions.Comment: Accepted as a poster in Conference on Computer Vision and Pattern
Recognition (CVPR), 201
Automated drowsiness detection for improved driving safety
Several approaches were proposed for the detection and prediction of drowsiness. The approaches can be categorized as estimating the fitness of duty, modeling the sleep-wake rhythms, measuring the vehicle based performance and online operator monitoring. Computer vision based online operator monitoring approach has become prominent due to its predictive ability of detecting drowsiness. Previous studies with this approach detect driver drowsiness primarily by making preassumptions about the relevant behavior, focusing on blink rate, eye closure, and yawning. Here we employ machine learning to datamine actual human behavior during drowsiness episodes. Automatic classifiers
for 30 facial actions from the Facial Action Coding system were developed
using machine learning on a separate database of spontaneous expressions. These facial actions include blinking and yawn motions, as well as a number of other facial movements. In addition, head motion was collected through automatic eye tracking and an accelerometer. These measures were passed to learning-based classifiers such as Adaboost and multinomial ridge regression. The system was able to predict sleep and crash episodes during a driving computer game with 96% accuracy within subjects and above 90% accuracy across subjects. This is the highest prediction rate reported to date for detecting real drowsiness. Moreover, the analysis revealed new information about human behavior during drowsy drivin
Reproducibility of the dynamics of facial expressions in unilateral facial palsy
The aim of this study was to assess the reproducibility of non-verbal facial
expressions in unilateral facial paralysis using dynamic four-dimensional (4D)
imaging. The Di4D system was used to record five facial expressions of 20 adult
patients. The system captured 60 three-dimensional (3D) images per second; each
facial expression took 3–4 seconds which was recorded in real time. Thus a set of
180 3D facial images was generated for each expression. The procedure was
repeated after 30 min to assess the reproducibility of the expressions. A
mathematical facial mesh consisting of thousands of quasi-point ‘vertices’ was
conformed to the face in order to determine the morphological characteristics in a
comprehensive manner. The vertices were tracked throughout the sequence of the
180 images. Five key 3D facial frames from each sequence of images were
analyzed. Comparisons were made between the first and second capture of each
facial expression to assess the reproducibility of facial movements. Corresponding
images were aligned using partial Procrustes analysis, and the root mean square
distance between them was calculated and analyzed statistically (paired Student ttest,
P < 0.05). Facial expressions of lip purse, cheek puff, and raising of eyebrows
were reproducible. Facial expressions of maximum smile and forceful eye closure
were not reproducible. The limited coordination of various groups of facial muscles
contributed to the lack of reproducibility of these facial expressions. 4D imaging is a
useful clinical tool for the assessment of facial expressions
Controllable Image-to-Video Translation: A Case Study on Facial Expression Generation
The recent advances in deep learning have made it possible to generate
photo-realistic images by using neural networks and even to extrapolate video
frames from an input video clip. In this paper, for the sake of both furthering
this exploration and our own interest in a realistic application, we study
image-to-video translation and particularly focus on the videos of facial
expressions. This problem challenges the deep neural networks by another
temporal dimension comparing to the image-to-image translation. Moreover, its
single input image fails most existing video generation methods that rely on
recurrent models. We propose a user-controllable approach so as to generate
video clips of various lengths from a single face image. The lengths and types
of the expressions are controlled by users. To this end, we design a novel
neural network architecture that can incorporate the user input into its skip
connections and propose several improvements to the adversarial training method
for the neural network. Experiments and user studies verify the effectiveness
of our approach. Especially, we would like to highlight that even for the face
images in the wild (downloaded from the Web and the authors' own photos), our
model can generate high-quality facial expression videos of which about 50\%
are labeled as real by Amazon Mechanical Turk workers.Comment: 10 page
Affective Facial Expression Processing via Simulation: A Probabilistic Model
Understanding the mental state of other people is an important skill for
intelligent agents and robots to operate within social environments. However,
the mental processes involved in `mind-reading' are complex. One explanation of
such processes is Simulation Theory - it is supported by a large body of
neuropsychological research. Yet, determining the best computational model or
theory to use in simulation-style emotion detection, is far from being
understood.
In this work, we use Simulation Theory and neuroscience findings on
Mirror-Neuron Systems as the basis for a novel computational model, as a way to
handle affective facial expressions. The model is based on a probabilistic
mapping of observations from multiple identities onto a single fixed identity
(`internal transcoding of external stimuli'), and then onto a latent space
(`phenomenological response'). Together with the proposed architecture we
present some promising preliminary resultsComment: Annual International Conference on Biologically Inspired Cognitive
Architectures - BICA 201
LOMo: Latent Ordinal Model for Facial Analysis in Videos
We study the problem of facial analysis in videos. We propose a novel weakly
supervised learning method that models the video event (expression, pain etc.)
as a sequence of automatically mined, discriminative sub-events (eg. onset and
offset phase for smile, brow lower and cheek raise for pain). The proposed
model is inspired by the recent works on Multiple Instance Learning and latent
SVM/HCRF- it extends such frameworks to model the ordinal or temporal aspect in
the videos, approximately. We obtain consistent improvements over relevant
competitive baselines on four challenging and publicly available video based
facial analysis datasets for prediction of expression, clinical pain and intent
in dyadic conversations. In combination with complimentary features, we report
state-of-the-art results on these datasets.Comment: 2016 IEEE Conference on Computer Vision and Pattern Recognition
(CVPR
- …