4 research outputs found
FullExpression - Emotion Recognition Software
During human evolution emotion expression became an important social tool that contributed to the complexification of societies. Human-computer interaction is commonly present in our daily life, and the industry is struggling for solutions that can analyze human emotions, in an attempt to provide better experiences. The purpose of this study was to understand if a software built using the transfer-learning technique on a deep learning model was capable of classifying human emotions, through facial expression analysis. A Convolutional Neuronal Network model was trained and used in a web application, which is available online. Several tools were created to facilitate the software development process, including the training and validation processes, and these are also available online. The data was collected after the combination of several facial expression emotion databases, such as KDEF_AKDEF, TFEID, Face_Place and jaffe. Software evaluation reveled an accuracy in identifying the correct emotions close to 80%. In addition, a comparison between the software and preliminary data from human’s performance, on recognizing facial expressed emotions, suggested that the software performed better. This work can be useful in many different domains such as marketing (to understand the effect of marketing campaigns on people’s emotional states), health (to help mental diseases diagnosis) and industry 4.0 (to create a better collaborating environment between humans and machines).Durante a evolução da espécie humana, a expressões de emoções tornou-se uma ferramenta social importante, que permitiu a criação de sociedades cada vez mais complexas. A interação entre humanos e máquinas acontece regularmente, evidenciando a necessidade da indústria desenvolver soluções que possam analisar emoções, de modo a proporcionar melhores experiências aos utilizadores. O propósito deste trabalho foi perceber se soluções de software desenvolvidas a partir da técnica de transfer-learning são capazes de classificar emoções humanas, a partir da análise de expressões faciais. Um modelo que implementa a arquitetura Convolutional Neuronal Network foi escolhido para ser treinado e utilizado na aplicação web desenvolvida neste trabalho, que está disponível online. A par da aplicação web, diferentes ferramentas foram criadas de forma a facilitar o processo de criação e avaliação de modelos Deep Learning, e estas também estão disponíveis online. Os dados foram recolhidos após a combinação de várias bases de dados de expressões de emoções (KDEF_AKDEF, TFEID, Face_Place and jaffe). A avaliação do software demostrou uma precisão na classificação de emoções próxima dos 80%. Para além disso, uma comparação entre o software e dados preliminares relativos ao reconhecimento de emoções por pessoas sugere que o software é melhor a classificar emoções. Os resultados deste trabalho podem aplicados em diversas áreas, como a publicidade (de forma a perceber os efeitos das campanhas no estado emocional das pessoas), a saúde (para um melhor diagnóstico de doenças mentais) e na indústria 4.0 (de forma a criar um melhor ambiente de colaboração entre humanos e máquinas)
A Proposal for Multimodal Emotion Recognition Using Aural Transformers and Action Units on RAVDESS Dataset
The work leading to these results was supported by the Spanish Ministry of Science and Innovation through the projects GOMINOLA (PID2020-118112RB-C21 and PID2020-118112RB-C22, funded by MCIN/AEI/10.13039/501100011033), CAVIAR (TEC2017-84593-C2-1-R, funded by MCIN/AEI/10.13039/501100011033/FEDER "Una manera de hacer Europa"), and AMIC-PoC (PDC2021-120846-C42, funded by MCIN/AEI/10.13039/501100011033 and by "the European Union "NextGenerationEU/PRTR"). This research also received funding from the European Union's Horizon2020 research and innovation program under grant agreement No 823907 (http://menhir-project.eu, accessed on 17 November 2021). Furthermore, R.K.'s research was supported by the Spanish Ministry of Education (FPI grant PRE2018-083225).Emotion recognition is attracting the attention of the research community due to its multiple
applications in different fields, such as medicine or autonomous driving. In this paper, we proposed
an automatic emotion recognizer system that consisted of a speech emotion recognizer (SER) and a
facial emotion recognizer (FER). For the SER, we evaluated a pre-trained xlsr-Wav2Vec2.0 transformer
using two transfer-learning techniques: embedding extraction and fine-tuning. The best accuracy
results were achieved when we fine-tuned the whole model by appending a multilayer perceptron
on top of it, confirming that the training was more robust when it did not start from scratch and the
previous knowledge of the network was similar to the task to adapt. Regarding the facial emotion
recognizer, we extracted the Action Units of the videos and compared the performance between
employing static models against sequential models. Results showed that sequential models beat
static models by a narrow difference. Error analysis reported that the visual systems could improve
with a detector of high-emotional load frames, which opened a new line of research to discover new
ways to learn from videos. Finally, combining these two modalities with a late fusion strategy, we
achieved 86.70% accuracy on the RAVDESS dataset on a subject-wise 5-CV evaluation, classifying
eight emotions. Results demonstrated that these modalities carried relevant information to detect
users’ emotional state and their combination allowed to improve the final system performance.Spanish Government PID2020-118112RB-C21
PID2020-118112RB-C22
MCIN/AEI/10.13039/501100011033
TEC2017-84593-C2-1-R
MCIN/AEI/10.13039/501100011033/FEDER
PDC2021-120846-C42European Union "NextGenerationEU/PRTR")European Union's Horizon2020 research and innovation program 823907German Research Foundation (DFG) PRE2018-08322
Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning
Emotion Recognition is attracting the attention of the research community due to the
multiple areas where it can be applied, such as in healthcare or in road safety systems. In this paper,
we propose a multimodal emotion recognition system that relies on speech and facial information.
For the speech-based modality, we evaluated several transfer-learning techniques, more specifically,
embedding extraction and Fine-Tuning. The best accuracy results were achieved when we fine-tuned
the CNN-14 of the PANNs framework, confirming that the training was more robust when it did not
start from scratch and the tasks were similar. Regarding the facial emotion recognizers, we propose a
framework that consists of a pre-trained Spatial Transformer Network on saliency maps and facial
images followed by a bi-LSTM with an attention mechanism. The error analysis reported that the
frame-based systems could present some problems when they were used directly to solve a videobased task despite the domain adaptation, which opens a new line of research to discover new ways
to correct this mismatch and take advantage of the embedded knowledge of these pre-trained models.
Finally, from the combination of these two modalities with a late fusion strategy, we achieved 80.08%
accuracy on the RAVDESS dataset on a subject-wise 5-CV evaluation, classifying eight emotions. The
results revealed that these modalities carry relevant information to detect users’ emotional state and
their combination enables improvement of system performance