2,752 research outputs found
Deep Facial Expression Recognition: A Survey
With the transition of facial expression recognition (FER) from
laboratory-controlled to challenging in-the-wild conditions and the recent
success of deep learning techniques in various fields, deep neural networks
have increasingly been leveraged to learn discriminative representations for
automatic FER. Recent deep FER systems generally focus on two important issues:
overfitting caused by a lack of sufficient training data and
expression-unrelated variations, such as illumination, head pose and identity
bias. In this paper, we provide a comprehensive survey on deep FER, including
datasets and algorithms that provide insights into these intrinsic problems.
First, we describe the standard pipeline of a deep FER system with the related
background knowledge and suggestions of applicable implementations for each
stage. We then introduce the available datasets that are widely used in the
literature and provide accepted data selection and evaluation principles for
these datasets. For the state of the art in deep FER, we review existing novel
deep neural networks and related training strategies that are designed for FER
based on both static images and dynamic image sequences, and discuss their
advantages and limitations. Competitive performances on widely used benchmarks
are also summarized in this section. We then extend our survey to additional
related issues and application scenarios. Finally, we review the remaining
challenges and corresponding opportunities in this field as well as future
directions for the design of robust deep FER systems
Multi-task, multi-label and multi-domain learning with residual convolutional networks for emotion recognition
Automated emotion recognition in the wild from facial images remains a
challenging problem. Although recent advances in Deep Learning have supposed a
significant breakthrough in this topic, strong changes in pose, orientation and
point of view severely harm current approaches. In addition, the acquisition of
labeled datasets is costly, and current state-of-the-art deep learning
algorithms cannot model all the aforementioned difficulties. In this paper, we
propose to apply a multi-task learning loss function to share a common feature
representation with other related tasks. Particularly we show that emotion
recognition benefits from jointly learning a model with a detector of facial
Action Units (collective muscle movements). The proposed loss function
addresses the problem of learning multiple tasks with heterogeneously labeled
data, improving previous multi-task approaches. We validate the proposal using
two datasets acquired in non controlled environments, and an application to
predict compound facial emotion expressions.Comment: Preprint submitted to IJC
Facial Expressions Tracking and Recognition: Database Protocols for Systems Validation and Evaluation
Each human face is unique. It has its own shape, topology, and distinguishing
features. As such, developing and testing facial tracking systems are
challenging tasks. The existing face recognition and tracking algorithms in
Computer Vision mainly specify concrete situations according to particular
goals and applications, requiring validation methodologies with data that fits
their purposes. However, a database that covers all possible variations of
external and factors does not exist, increasing researchers' work in acquiring
their own data or compiling groups of databases.
To address this shortcoming, we propose a methodology for facial data
acquisition through definition of fundamental variables, such as subject
characteristics, acquisition hardware, and performance parameters. Following
this methodology, we also propose two protocols that allow the capturing of
facial behaviors under uncontrolled and real-life situations. As validation, we
executed both protocols which lead to creation of two sample databases: FdMiee
(Facial database with Multi input, expressions, and environments) and FACIA
(Facial Multimodal database driven by emotional induced acting).
Using different types of hardware, FdMiee captures facial information under
environmental and facial behaviors variations. FACIA is an extension of FdMiee
introducing a pipeline to acquire additional facial behaviors and speech using
an emotion-acting method. Therefore, this work eases the creation of adaptable
database according to algorithm's requirements and applications, leading to
simplified validation and testing processes.Comment: 10 pages, 6 images, Computers & Graphic
Super-realtime facial landmark detection and shape fitting by deep regression of shape model parameters
We present a method for highly efficient landmark detection that combines
deep convolutional neural networks with well established model-based fitting
algorithms. Motivated by established model-based fitting methods such as active
shapes, we use a PCA of the landmark positions to allow generative modeling of
facial landmarks. Instead of computing the model parameters using iterative
optimization, the PCA is included in a deep neural network using a novel layer
type. The network predicts model parameters in a single forward pass, thereby
allowing facial landmark detection at several hundreds of frames per second.
Our architecture allows direct end-to-end training of a model-based landmark
detection method and shows that deep neural networks can be used to reliably
predict model parameters directly without the need for an iterative
optimization. The method is evaluated on different datasets for facial landmark
detection and medical image segmentation. PyTorch code is freely available at
https://github.com/justusschock/shapenetComment: https://github.com/justusschock/shapene
Unsupervised Features for Facial Expression Intensity Estimation over Time
The diversity of facial shapes and motions among persons is one of the
greatest challenges for automatic analysis of facial expressions. In this
paper, we propose a feature describing expression intensity over time, while
being invariant to person and the type of performed expression. Our feature is
a weighted combination of the dynamics of multiple points adapted to the
overall expression trajectory. We evaluate our method on several tasks all
related to temporal analysis of facial expression. The proposed feature is
compared to a state-of-the-art method for expression intensity estimation,
which it outperforms. We use our proposed feature to temporally align multiple
sequences of recorded 3D facial expressions. Furthermore, we show how our
feature can be used to reveal person-specific differences in performances of
facial expressions. Additionally, we apply our feature to identify the local
changes in face video sequences based on action unit labels. For all the
experiments our feature proves to be robust against noise and outliers, making
it applicable to a variety of applications for analysis of facial movements.Comment: Accepted for CVPR 2018 Workshop Trac
A Deep Learning Perspective on the Origin of Facial Expressions
Facial expressions play a significant role in human communication and
behavior. Psychologists have long studied the relationship between facial
expressions and emotions. Paul Ekman et al., devised the Facial Action Coding
System (FACS) to taxonomize human facial expressions and model their behavior.
The ability to recognize facial expressions automatically, enables novel
applications in fields like human-computer interaction, social gaming, and
psychological research. There has been a tremendously active research in this
field, with several recent papers utilizing convolutional neural networks (CNN)
for feature extraction and inference. In this paper, we employ CNN
understanding methods to study the relation between the features these
computational networks are using, the FACS and Action Units (AU). We verify our
findings on the Extended Cohn-Kanade (CK+), NovaEmotions and FER2013 datasets.
We apply these models to various tasks and tests using transfer learning,
including cross-dataset validation and cross-task performance. Finally, we
exploit the nature of the FER based CNN models for the detection of
micro-expressions and achieve state-of-the-art accuracy using a simple
long-short-term-memory (LSTM) recurrent neural network (RNN)
A Novel Space-Time Representation on the Positive Semidefinite Con for Facial Expression Recognition
In this paper, we study the problem of facial expression recognition using a
novel space-time geometric representation. We describe the temporal evolution
of facial landmarks as parametrized trajectories on the Riemannian manifold of
positive semidefinite matrices of fixed-rank. Our representation has the
advantage to bring naturally a second desirable quantity when comparing shapes
-- the spatial covariance -- in addition to the conventional affine-shape
representation. We derive then geometric and computational tools for
rate-invariant analysis and adaptive re-sampling of trajectories, grounding on
the Riemannian geometry of the manifold. Specifically, our approach involves
three steps: 1) facial landmarks are first mapped into the Riemannian manifold
of positive semidefinite matrices of rank 2, to build time-parameterized
trajectories; 2) a temporal alignment is performed on the trajectories,
providing a geometry-aware (dis-)similarity measure between them; 3) finally,
pairwise proximity function SVM (ppfSVM) is used to classify them,
incorporating the latter (dis-)similarity measure into the kernel function. We
show the effectiveness of the proposed approach on four publicly available
benchmarks (CK+, MMI, Oulu-CASIA, and AFEW). The results of the proposed
approach are comparable to or better than the state-of-the-art methods when
involving only facial landmarks.Comment: To be appeared at ICCV 201
Feature Extraction via Recurrent Random Deep Ensembles and its Application in Gruop-level Happiness Estimation
This paper presents a novel ensemble framework to extract highly
discriminative feature representation of image and its application for
group-level happpiness intensity prediction in wild. In order to generate
enough diversity of decisions, n convolutional neural networks are trained by
bootstrapping the training set and extract n features for each image from them.
A recurrent neural network (RNN) is then used to remember which network
extracts better feature and generate the final feature representation for one
individual image. Several group emotion models (GEM) are used to aggregate face
fea- tures in a group and use parameter-optimized support vector regressor
(SVR) to get the final results. Through extensive experiments, the great
effectiveness of the proposed recurrent random deep ensembles (RRDE) is
demonstrated in both structural and decisional ways. The best result yields a
0.55 root-mean-square error (RMSE) on validation set of HAPPEI dataset,
significantly better than the baseline of 0.78
Predictive biometrics: A review and analysis of predicting personal characteristics from biometric data
Interest in the exploitation of soft biometrics information has continued to develop over the last decade or so. In comparison with traditional biometrics, which focuses principally on person identification, the idea of soft biometrics processing is to study the utilisation of more general information regarding a system user, which is not necessarily unique. There are increasing indications that this type of data will have great value in providing complementary information for user authentication. However, the authors have also seen a growing interest in broadening the predictive capabilities of biometric data, encompassing both easily definable characteristics such as subject age and, most recently, `higher level' characteristics such as emotional or mental states. This study will present a selective review of the predictive capabilities, in the widest sense, of biometric data processing, providing an analysis of the key issues still adequately to be addressed if this concept of predictive biometrics is to be fully exploited in the future
Emotions in Pervasive Computing Environments
The ability of an intelligent environment to connect and adapt to real
internal sates, needs and behaviors' meaning of humans can be made possible by
considering users' emotional states as contextual parameters. In this paper, we
build on enactive psychology and investigate the incorporation of emotions in
pervasive systems. We define emotions, and discuss the coding of emotional
human markers by smart environments. In addition, we compare some existing
works and identify how emotions can be detected and modeled by a pervasive
system in order to enhance its service and response to users. Finally, we
analyze closely one XML-based language for representing and annotating emotions
known as EARL and raise two important issues which pertain to emotion
representation and modeling in XML-based languages.Comment: International Journal of Computer Science Issues, IJCSI Volume 6,
Issue 1, pp8-22, November 200
- …