Search CORE

2 research outputs found

A Personalized Affective Memory Neural Model for Improving Emotion Recognition

Author: Barros Pablo
Parisi German I.
Wermter Stefan
Publication venue
Publication date: 31/05/2020
Field of study

Recent models of emotion recognition strongly rely on supervised deep learning solutions for the distinction of general emotion expressions. However, they are not reliable when recognizing online and personalized facial expressions, e.g., for person-specific affective understanding. In this paper, we present a neural model based on a conditional adversarial autoencoder to learn how to represent and edit general emotion expressions. We then propose Grow-When-Required networks as personalized affective memories to learn individualized aspects of emotion expressions. Our model achieves state-of-the-art performance on emotion recognition when evaluated on \textit{in-the-wild} datasets. Furthermore, our experiments include ablation studies and neural visualizations in order to explain the behavior of our model.Comment: Accepted by the International Conference on Machine Learning 2019 (ICML2019

arXiv.org e-Print Archive

Temporal aggregation of audio-visual modalities for emotion recognition

Author: Birhala Andreea
Dutu Liviu Cristian
Radoi Anamaria
Ristea Catalin Nicolae
Publication venue
Publication date: 08/07/2020
Field of study

Emotion recognition has a pivotal role in affective computing and in human-computer interaction. The current technological developments lead to increased possibilities of collecting data about the emotional state of a person. In general, human perception regarding the emotion transmitted by a subject is based on vocal and visual information collected in the first seconds of interaction with the subject. As a consequence, the integration of verbal (i.e., speech) and non-verbal (i.e., image) information seems to be the preferred choice in most of the current approaches towards emotion recognition. In this paper, we propose a multimodal fusion technique for emotion recognition based on combining audio-visual modalities from a temporal window with different temporal offsets for each modality. We show that our proposed method outperforms other methods from the literature and human accuracy rating. The experiments are conducted over the open-access multimodal dataset CREMA-D

arXiv.org e-Print Archive