4,041 research outputs found
Island Loss for Learning Discriminative Features in Facial Expression Recognition
Over the past few years, Convolutional Neural Networks (CNNs) have shown
promise on facial expression recognition. However, the performance degrades
dramatically under real-world settings due to variations introduced by subtle
facial appearance changes, head pose variations, illumination changes, and
occlusions.
In this paper, a novel island loss is proposed to enhance the discriminative
power of the deeply learned features. Specifically, the IL is designed to
reduce the intra-class variations while enlarging the inter-class differences
simultaneously. Experimental results on four benchmark expression databases
have demonstrated that the CNN with the proposed island loss (IL-CNN)
outperforms the baseline CNN models with either traditional softmax loss or the
center loss and achieves comparable or better performance compared with the
state-of-the-art methods for facial expression recognition.Comment: 8 pages, 3 figure
Group-level Emotion Recognition using Transfer Learning from Face Identification
In this paper, we describe our algorithmic approach, which was used for
submissions in the fifth Emotion Recognition in the Wild (EmotiW 2017)
group-level emotion recognition sub-challenge. We extracted feature vectors of
detected faces using the Convolutional Neural Network trained for face
identification task, rather than traditional pre-training on emotion
recognition problems. In the final pipeline an ensemble of Random Forest
classifiers was learned to predict emotion score using available training set.
In case when the faces have not been detected, one member of our ensemble
extracts features from the whole image. During our experimental study, the
proposed approach showed the lowest error rate when compared to other explored
techniques. In particular, we achieved 75.4% accuracy on the validation data,
which is 20% higher than the handcrafted feature-based baseline. The source
code using Keras framework is publicly available.Comment: 5 pages, 3 figures, accepted for publication at ICMI17 (EmotiW Grand
Challenge
EmoCLIP: A Vision-Language Method for Zero-Shot Video Facial Expression Recognition
Facial Expression Recognition (FER) is a crucial task in affective computing,
but its conventional focus on the seven basic emotions limits its applicability
to the complex and expanding emotional spectrum. To address the issue of new
and unseen emotions present in dynamic in-the-wild FER, we propose a novel
vision-language model that utilises sample-level text descriptions (i.e.
captions of the context, expressions or emotional cues) as natural language
supervision, aiming to enhance the learning of rich latent representations, for
zero-shot classification. To test this, we evaluate using zero-shot
classification of the model trained on sample-level descriptions on four
popular dynamic FER datasets. Our findings show that this approach yields
significant improvements when compared to baseline methods. Specifically, for
zero-shot video FER, we outperform CLIP by over 10\% in terms of Weighted
Average Recall and 5\% in terms of Unweighted Average Recall on several
datasets. Furthermore, we evaluate the representations obtained from the
network trained using sample-level descriptions on the downstream task of
mental health symptom estimation, achieving performance comparable or superior
to state-of-the-art methods and strong agreement with human experts. Namely, we
achieve a Pearson's Correlation Coefficient of up to 0.85 on schizophrenia
symptom severity estimation, which is comparable to human experts' agreement.
The code is publicly available at: https://github.com/NickyFot/EmoCLIP.Comment: 10 pages, 3 figure
Affect recognition & generation in-the-wild
Affect recognition based on a subject’s facial expressions has been a topic of major research in the attempt to generate machines that can understand the way subjects feel, act and react. In the past, due to the unavailability of large amounts of data captured in real-life situations, research has mainly focused on controlled environments. However, recently, social media and platforms have been widely used. Moreover, deep learning has emerged as a means to solve visual analysis and recognition problems. This Ph.D. Thesis exploits these advances and makes significant contributions for affect analysis and recognition in-the-wild.
We tackle affect analysis and recognition as a dual knowledge generation problem: i) we create new, large and rich in-the-wild databases and ii) we design and train novel deep neural architectures that are able to analyse affect over these databases and to successfully generalise their performance on other datasets.
At first, we present the creation of Aff-Wild database annotated according to valence-arousal and an end-to-end CNN-RNN architecture, AffWildNet. Then we use AffWildNet as a robust prior for dimensional and categorical affect recognition and extend it by extracting low-/mid-/high-level latent information and analysing this via multiple RNNs. Additionally, we propose a novel loss function for DNN-based categorical affect recognition.
Next, we generate Aff-Wild2, the first database containing annotations for all main behavior tasks: estimate Valence-Arousal; classify into Basic Expressions; detect Action Units. We develop multi-task and multi-modal extensions of AffWildNet by fusing these tasks and propose a novel holistic approach that utilises all existing databases with non-overlapping annotations and couples them through co-annotation and distribution matching.
Finally, we present an approach for valence-arousal, or basic expressions’ facial affect synthesis. We generate an image with a given affect, or a sequence of images with evolving affect, by annotating a 4-D database and utilising a 3-D morphable model.Open Acces
- …