4 research outputs found

    FullExpression - Emotion Recognition Software

    Get PDF
    During human evolution emotion expression became an important social tool that contributed to the complexification of societies. Human-computer interaction is commonly present in our daily life, and the industry is struggling for solutions that can analyze human emotions, in an attempt to provide better experiences. The purpose of this study was to understand if a software built using the transfer-learning technique on a deep learning model was capable of classifying human emotions, through facial expression analysis. A Convolutional Neuronal Network model was trained and used in a web application, which is available online. Several tools were created to facilitate the software development process, including the training and validation processes, and these are also available online. The data was collected after the combination of several facial expression emotion databases, such as KDEF_AKDEF, TFEID, Face_Place and jaffe. Software evaluation reveled an accuracy in identifying the correct emotions close to 80%. In addition, a comparison between the software and preliminary data from human’s performance, on recognizing facial expressed emotions, suggested that the software performed better. This work can be useful in many different domains such as marketing (to understand the effect of marketing campaigns on people’s emotional states), health (to help mental diseases diagnosis) and industry 4.0 (to create a better collaborating environment between humans and machines).Durante a evolução da espécie humana, a expressões de emoções tornou-se uma ferramenta social importante, que permitiu a criação de sociedades cada vez mais complexas. A interação entre humanos e máquinas acontece regularmente, evidenciando a necessidade da indústria desenvolver soluções que possam analisar emoções, de modo a proporcionar melhores experiências aos utilizadores. O propósito deste trabalho foi perceber se soluções de software desenvolvidas a partir da técnica de transfer-learning são capazes de classificar emoções humanas, a partir da análise de expressões faciais. Um modelo que implementa a arquitetura Convolutional Neuronal Network foi escolhido para ser treinado e utilizado na aplicação web desenvolvida neste trabalho, que está disponível online. A par da aplicação web, diferentes ferramentas foram criadas de forma a facilitar o processo de criação e avaliação de modelos Deep Learning, e estas também estão disponíveis online. Os dados foram recolhidos após a combinação de várias bases de dados de expressões de emoções (KDEF_AKDEF, TFEID, Face_Place and jaffe). A avaliação do software demostrou uma precisão na classificação de emoções próxima dos 80%. Para além disso, uma comparação entre o software e dados preliminares relativos ao reconhecimento de emoções por pessoas sugere que o software é melhor a classificar emoções. Os resultados deste trabalho podem aplicados em diversas áreas, como a publicidade (de forma a perceber os efeitos das campanhas no estado emocional das pessoas), a saúde (para um melhor diagnóstico de doenças mentais) e na indústria 4.0 (de forma a criar um melhor ambiente de colaboração entre humanos e máquinas)

    A Proposal for Multimodal Emotion Recognition Using Aural Transformers and Action Units on RAVDESS Dataset

    Get PDF
    The work leading to these results was supported by the Spanish Ministry of Science and Innovation through the projects GOMINOLA (PID2020-118112RB-C21 and PID2020-118112RB-C22, funded by MCIN/AEI/10.13039/501100011033), CAVIAR (TEC2017-84593-C2-1-R, funded by MCIN/AEI/10.13039/501100011033/FEDER "Una manera de hacer Europa"), and AMIC-PoC (PDC2021-120846-C42, funded by MCIN/AEI/10.13039/501100011033 and by "the European Union "NextGenerationEU/PRTR"). This research also received funding from the European Union's Horizon2020 research and innovation program under grant agreement No 823907 (http://menhir-project.eu, accessed on 17 November 2021). Furthermore, R.K.'s research was supported by the Spanish Ministry of Education (FPI grant PRE2018-083225).Emotion recognition is attracting the attention of the research community due to its multiple applications in different fields, such as medicine or autonomous driving. In this paper, we proposed an automatic emotion recognizer system that consisted of a speech emotion recognizer (SER) and a facial emotion recognizer (FER). For the SER, we evaluated a pre-trained xlsr-Wav2Vec2.0 transformer using two transfer-learning techniques: embedding extraction and fine-tuning. The best accuracy results were achieved when we fine-tuned the whole model by appending a multilayer perceptron on top of it, confirming that the training was more robust when it did not start from scratch and the previous knowledge of the network was similar to the task to adapt. Regarding the facial emotion recognizer, we extracted the Action Units of the videos and compared the performance between employing static models against sequential models. Results showed that sequential models beat static models by a narrow difference. Error analysis reported that the visual systems could improve with a detector of high-emotional load frames, which opened a new line of research to discover new ways to learn from videos. Finally, combining these two modalities with a late fusion strategy, we achieved 86.70% accuracy on the RAVDESS dataset on a subject-wise 5-CV evaluation, classifying eight emotions. Results demonstrated that these modalities carried relevant information to detect users’ emotional state and their combination allowed to improve the final system performance.Spanish Government PID2020-118112RB-C21 PID2020-118112RB-C22 MCIN/AEI/10.13039/501100011033 TEC2017-84593-C2-1-R MCIN/AEI/10.13039/501100011033/FEDER PDC2021-120846-C42European Union "NextGenerationEU/PRTR")European Union's Horizon2020 research and innovation program 823907German Research Foundation (DFG) PRE2018-08322

    Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning

    Get PDF
    Emotion Recognition is attracting the attention of the research community due to the multiple areas where it can be applied, such as in healthcare or in road safety systems. In this paper, we propose a multimodal emotion recognition system that relies on speech and facial information. For the speech-based modality, we evaluated several transfer-learning techniques, more specifically, embedding extraction and Fine-Tuning. The best accuracy results were achieved when we fine-tuned the CNN-14 of the PANNs framework, confirming that the training was more robust when it did not start from scratch and the tasks were similar. Regarding the facial emotion recognizers, we propose a framework that consists of a pre-trained Spatial Transformer Network on saliency maps and facial images followed by a bi-LSTM with an attention mechanism. The error analysis reported that the frame-based systems could present some problems when they were used directly to solve a videobased task despite the domain adaptation, which opens a new line of research to discover new ways to correct this mismatch and take advantage of the embedded knowledge of these pre-trained models. Finally, from the combination of these two modalities with a late fusion strategy, we achieved 80.08% accuracy on the RAVDESS dataset on a subject-wise 5-CV evaluation, classifying eight emotions. The results revealed that these modalities carry relevant information to detect users’ emotional state and their combination enables improvement of system performance
    corecore