19,373 research outputs found
Machine Analysis of Facial Expressions
No abstract
FEAFA: A Well-Annotated Dataset for Facial Expression Analysis and 3D Facial Animation
Facial expression analysis based on machine learning requires large number of
well-annotated data to reflect different changes in facial motion. Publicly
available datasets truly help to accelerate research in this area by providing
a benchmark resource, but all of these datasets, to the best of our knowledge,
are limited to rough annotations for action units, including only their
absence, presence, or a five-level intensity according to the Facial Action
Coding System. To meet the need for videos labeled in great detail, we present
a well-annotated dataset named FEAFA for Facial Expression Analysis and 3D
Facial Animation. One hundred and twenty-two participants, including children,
young adults and elderly people, were recorded in real-world conditions. In
addition, 99,356 frames were manually labeled using Expression Quantitative
Tool developed by us to quantify 9 symmetrical FACS action units, 10
asymmetrical (unilateral) FACS action units, 2 symmetrical FACS action
descriptors and 2 asymmetrical FACS action descriptors, and each action unit or
action descriptor is well-annotated with a floating point number between 0 and
1. To provide a baseline for use in future research, a benchmark for the
regression of action unit values based on Convolutional Neural Networks are
presented. We also demonstrate the potential of our FEAFA dataset for 3D facial
animation. Almost all state-of-the-art algorithms for facial animation are
achieved based on 3D face reconstruction. We hence propose a novel method that
drives virtual characters only based on action unit value regression of the 2D
video frames of source actors.Comment: 9 pages, 7 figure
Towards a comprehensive 3D dynamic facial expression database
Human faces play an important role in everyday life, including the expression of person identity,
emotion and intentionality, along with a range of biological functions. The human face has also become the
subject of considerable research effort, and there has been a shift towards understanding it using stimuli of
increasingly more realistic formats. In the current work, we outline progress made in the production of a
database of facial expressions in arguably the most realistic format, 3D dynamic. A suitable architecture for
capturing such 3D dynamic image sequences is described and then used to record seven expressions (fear,
disgust, anger, happiness, surprise, sadness and pain) by 10 actors at 3 levels of intensity (mild, normal and
extreme). We also present details of a psychological experiment that was used to formally evaluate the
accuracy of the expressions in a 2D dynamic format. The result is an initial, validated database for researchers
and practitioners. The goal is to scale up the work with more actors and expression types
Capture, Learning, and Synthesis of 3D Speaking Styles
Audio-driven 3D facial animation has been widely explored, but achieving
realistic, human-like performance is still unsolved. This is due to the lack of
available 3D datasets, models, and standard evaluation metrics. To address
this, we introduce a unique 4D face dataset with about 29 minutes of 4D scans
captured at 60 fps and synchronized audio from 12 speakers. We then train a
neural network on our dataset that factors identity from facial motion. The
learned model, VOCA (Voice Operated Character Animation) takes any speech
signal as input - even speech in languages other than English - and
realistically animates a wide range of adult faces. Conditioning on subject
labels during training allows the model to learn a variety of realistic
speaking styles. VOCA also provides animator controls to alter speaking style,
identity-dependent facial shape, and pose (i.e. head, jaw, and eyeball
rotations) during animation. To our knowledge, VOCA is the only realistic 3D
facial animation model that is readily applicable to unseen subjects without
retargeting. This makes VOCA suitable for tasks like in-game video, virtual
reality avatars, or any scenario in which the speaker, speech, or language is
not known in advance. We make the dataset and model available for research
purposes at http://voca.is.tue.mpg.de.Comment: To appear in CVPR 201
End-to-end 3D face reconstruction with deep neural networks
Monocular 3D facial shape reconstruction from a single 2D facial image has
been an active research area due to its wide applications. Inspired by the
success of deep neural networks (DNN), we propose a DNN-based approach for
End-to-End 3D FAce Reconstruction (UH-E2FAR) from a single 2D image. Different
from recent works that reconstruct and refine the 3D face in an iterative
manner using both an RGB image and an initial 3D facial shape rendering, our
DNN model is end-to-end, and thus the complicated 3D rendering process can be
avoided. Moreover, we integrate in the DNN architecture two components, namely
a multi-task loss function and a fusion convolutional neural network (CNN) to
improve facial expression reconstruction. With the multi-task loss function, 3D
face reconstruction is divided into neutral 3D facial shape reconstruction and
expressive 3D facial shape reconstruction. The neutral 3D facial shape is
class-specific. Therefore, higher layer features are useful. In comparison, the
expressive 3D facial shape favors lower or intermediate layer features. With
the fusion-CNN, features from different intermediate layers are fused and
transformed for predicting the 3D expressive facial shape. Through extensive
experiments, we demonstrate the superiority of our end-to-end framework in
improving the accuracy of 3D face reconstruction.Comment: Accepted to CVPR1
Objective Classes for Micro-Facial Expression Recognition
Micro-expressions are brief spontaneous facial expressions that appear on a
face when a person conceals an emotion, making them different to normal facial
expressions in subtlety and duration. Currently, emotion classes within the
CASME II dataset are based on Action Units and self-reports, creating conflicts
during machine learning training. We will show that classifying expressions
using Action Units, instead of predicted emotion, removes the potential bias of
human reporting. The proposed classes are tested using LBP-TOP, HOOF and HOG 3D
feature descriptors. The experiments are evaluated on two benchmark FACS coded
datasets: CASME II and SAMM. The best result achieves 86.35\% accuracy when
classifying the proposed 5 classes on CASME II using HOG 3D, outperforming the
result of the state-of-the-art 5-class emotional-based classification in CASME
II. Results indicate that classification based on Action Units provides an
objective method to improve micro-expression recognition.Comment: 11 pages, 4 figures and 5 tables. This paper will be submitted for
journal revie
Relative Facial Action Unit Detection
This paper presents a subject-independent facial action unit (AU) detection
method by introducing the concept of relative AU detection, for scenarios where
the neutral face is not provided. We propose a new classification objective
function which analyzes the temporal neighborhood of the current frame to
decide if the expression recently increased, decreased or showed no change.
This approach is a significant change from the conventional absolute method
which decides about AU classification using the current frame, without an
explicit comparison with its neighboring frames. Our proposed method improves
robustness to individual differences such as face scale and shape, age-related
wrinkles, and transitions among expressions (e.g., lower intensity of
expressions). Our experiments on three publicly available datasets (Extended
Cohn-Kanade (CK+), Bosphorus, and DISFA databases) show significant improvement
of our approach over conventional absolute techniques. Keywords: facial action
coding system (FACS); relative facial action unit detection; temporal
information;Comment: Accepted at IEEE Winter Conference on Applications of Computer
Vision, Steamboat Springs Colorado, USA, 201
- …