Search CORE

24 research outputs found

Emotion Recognition in the Wild using Deep Neural Networks and Bayesian Classifiers

Author: Cangelosi Angelo
Patacchiola Massimiliano
Spataro William
Surace Luca
Sönmez Elena Battini
Publication venue
Publication date: 12/09/2017
Field of study

Group emotion recognition in the wild is a challenging problem, due to the unstructured environments in which everyday life pictures are taken. Some of the obstacles for an effective classification are occlusions, variable lighting conditions, and image quality. In this work we present a solution based on a novel combination of deep neural networks and Bayesian classifiers. The neural network works on a bottom-up approach, analyzing emotions expressed by isolated faces. The Bayesian classifier estimates a global emotion integrating top-down features obtained through a scene descriptor. In order to validate the system we tested the framework on the dataset released for the Emotion Recognition in the Wild Challenge 2017. Our method achieved an accuracy of 64.68% on the test set, significantly outperforming the 53.62% competition baseline.Comment: accepted by the Fifth Emotion Recognition in the Wild (EmotiW) Challenge 201

arXiv.org e-Print Archive

Crossref

The University of Manchester - Institutional Repository

Semi-Supervised Speech Emotion Recognition with Ladder Networks

Author: Busso Carlos
Parthasarathy Srinivas
Publication venue
Publication date: 08/05/2019
Field of study

Speech emotion recognition (SER) systems find applications in various fields such as healthcare, education, and security and defense. A major drawback of these systems is their lack of generalization across different conditions. This problem can be solved by training models on large amounts of labeled data from the target domain, which is expensive and time-consuming. Another approach is to increase the generalization of the models. An effective way to achieve this goal is by regularizing the models through multitask learning (MTL), where auxiliary tasks are learned along with the primary task. These methods often require the use of labeled data which is computationally expensive to collect for emotion recognition (gender, speaker identity, age or other emotional descriptors). This study proposes the use of ladder networks for emotion recognition, which utilizes an unsupervised auxiliary task. The primary task is a regression problem to predict emotional attributes. The auxiliary task is the reconstruction of intermediate feature representations using a denoising autoencoder. This auxiliary task does not require labels so it is possible to train the framework in a semi-supervised fashion with abundant unlabeled data from the target domain. This study shows that the proposed approach creates a powerful framework for SER, achieving superior performance than fully supervised single-task learning (STL) and MTL baselines. The approach is implemented with several acoustic features, showing that ladder networks generalize significantly better in cross-corpus settings. Compared to the STL baselines, the proposed approach achieves relative gains in concordance correlation coefficient (CCC) between 3.0% and 3.5% for within corpus evaluations, and between 16.1% and 74.1% for cross corpus evaluations, highlighting the power of the architecture

arXiv.org e-Print Archive

Recognizing emotion from Turkish speech using acoustic features

Author: Caglar Oflazoglu
Serdar Yildirim
Publication venue: Springer Nature
Publication date: 01/01/2013
Field of study

Springer - Publisher Connector

M3ER: Multiplicative Multimodal Emotion Recognition Using Facial, Textual, and Speech Cues

Author: Bera Aniket
Bhattacharya Uttaran
Chandra Rohan
Manocha Dinesh
Mittal Trisha
Publication venue
Publication date: 22/11/2019
Field of study

We present M3ER, a learning-based method for emotion recognition from multiple input modalities. Our approach combines cues from multiple co-occurring modalities (such as face, text, and speech) and also is more robust than other methods to sensor noise in any of the individual modalities. M3ER models a novel, data-driven multiplicative fusion method to combine the modalities, which learn to emphasize the more reliable cues and suppress others on a per-sample basis. By introducing a check step which uses Canonical Correlational Analysis to differentiate between ineffective and effective modalities, M3ER is robust to sensor noise. M3ER also generates proxy features in place of the ineffectual modalities. We demonstrate the efficiency of our network through experimentation on two benchmark datasets, IEMOCAP and CMU-MOSEI. We report a mean accuracy of 82.7% on IEMOCAP and 89.0% on CMU-MOSEI, which, collectively, is an improvement of about 5% over prior work

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Recognizing emotion from Turkish speech using acoustic features

Author: A Batliner
A Batliner
AJ Smola
B Schuller
B Schuller
B Schuller
B Schuller
B Schuller
BS Schuller
C Busso
C Clavel
C Clavel
C Oflazoglu
Caglar Oflazoglu
CC Chang
CC Lee
CM Lee
E Douglas-Cowie
E Douglas-Cowie
E Douglas-Cowie
EM Albornoz
F Burkhardt
F Eyben
G McKeown
IS Engberg
J Ang
J Fleiss
JHL Hansen
KR Scherer
M Bradley
M Grimm
M Grimm
M Grimm
M Hall
M Hall
M Liberman
M Shami
P Ekman
R Bouckaert
S Arunachalam
S Steidl
S Yildirim
Serdar Yildirim
T Banziger
T Polzehl
TL Nwe
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref