83 research outputs found
CAKE: Compact and Accurate K-dimensional representation of Emotion
Numerous models describing the human emotional states have been built by the
psychology community. Alongside, Deep Neural Networks (DNN) are reaching
excellent performances and are becoming interesting features extraction tools
in many computer vision tasks.Inspired by works from the psychology community,
we first study the link between the compact two-dimensional representation of
the emotion known as arousal-valence, and discrete emotion classes (e.g. anger,
happiness, sadness, etc.) used in the computer vision community. It enables to
assess the benefits -- in terms of discrete emotion inference -- of adding an
extra dimension to arousal-valence (usually named dominance). Building on these
observations, we propose CAKE, a 3-dimensional representation of emotion
learned in a multi-domain fashion, achieving accurate emotion recognition on
several public datasets. Moreover, we visualize how emotions boundaries are
organized inside DNN representations and show that DNNs are implicitly learning
arousal-valence-like descriptions of emotions. Finally, we use the CAKE
representation to compare the quality of the annotations of different public
datasets
Kernelized dense layers for facial expression recognition
Fully connected layer is an essential component of Convolutional Neural
Networks (CNNs), which demonstrates its efficiency in computer vision tasks.
The CNN process usually starts with convolution and pooling layers that first
break down the input images into features, and then analyze them independently.
The result of this process feeds into a fully connected neural network
structure which drives the final classification decision. In this paper, we
propose a Kernelized Dense Layer (KDL) which captures higher order feature
interactions instead of conventional linear relations. We apply this method to
Facial Expression Recognition (FER) and evaluate its performance on RAF,
FER2013 and ExpW datasets. The experimental results demonstrate the benefits of
such layer and show that our model achieves competitive results with respect to
the state-of-the-art approaches
Quaternion Orthogonal Transformer for Facial Expression Recognition in the Wild
Facial expression recognition (FER) is a challenging topic in artificial
intelligence. Recently, many researchers have attempted to introduce Vision
Transformer (ViT) to the FER task. However, ViT cannot fully utilize emotional
features extracted from raw images and requires a lot of computing resources.
To overcome these problems, we propose a quaternion orthogonal transformer
(QOT) for FER. Firstly, to reduce redundancy among features extracted from
pre-trained ResNet-50, we use the orthogonal loss to decompose and compact
these features into three sets of orthogonal sub-features. Secondly, three
orthogonal sub-features are integrated into a quaternion matrix, which
maintains the correlations between different orthogonal components. Finally, we
develop a quaternion vision transformer (Q-ViT) for feature classification. The
Q-ViT adopts quaternion operations instead of the original operations in ViT,
which improves the final accuracies with fewer parameters. Experimental results
on three in-the-wild FER datasets show that the proposed QOT outperforms
several state-of-the-art models and reduces the computations.Comment: This paper has been accepted to ICASSP202
The Many Moods of Emotion
This paper presents a novel approach to the facial expression generation
problem. Building upon the assumption of the psychological community that
emotion is intrinsically continuous, we first design our own continuous emotion
representation with a 3-dimensional latent space issued from a neural network
trained on discrete emotion classification. The so-obtained representation can
be used to annotate large in the wild datasets and later used to trained a
Generative Adversarial Network. We first show that our model is able to map
back to discrete emotion classes with a objectively and subjectively better
quality of the images than usual discrete approaches. But also that we are able
to pave the larger space of possible facial expressions, generating the many
moods of emotion. Moreover, two axis in this space may be found to generate
similar expression changes as in traditional continuous representations such as
arousal-valence. Finally we show from visual interpretation, that the third
remaining dimension is highly related to the well-known dominance dimension
from psychology
- …