Search CORE

2,698 research outputs found

Emotion recognition based on the energy distribution of plosive syllables

Author: Agrima Abdellah
Elmazouzi Laila
Farchi Abdelmajid
Mounir Badia
Mounir Ilham
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/12/2022
Field of study

We usually encounter two problems during speech emotion recognition (SER): expression and perception problems, which vary considerably between speakers, languages, and sentence pronunciation. In fact, finding an optimal system that characterizes the emotions overcoming all these differences is a promising prospect. In this perspective, we considered two emotional databases: Moroccan Arabic dialect emotional database (MADED), and Ryerson audio-visual database on emotional speech and song (RAVDESS) which present notable differences in terms of type (natural/acted), and language (Arabic/English). We proposed a detection process based on 27 acoustic features extracted from consonant-vowel (CV) syllabic units: \ba, \du, \ki, \ta common to both databases. We tested two classification strategies: multiclass (all emotions combined: joy, sadness, neutral, anger) and binary (neutral vs. others, positive emotions (joy) vs. negative emotions (sadness, anger), sadness vs. anger). These strategies were tested three times: i) on MADED, ii) on RAVDESS, iii) on MADED and RAVDESS. The proposed method gave better recognition accuracy in the case of binary classification. The rates reach an average of 78% for the multi-class classification, 100% for neutral vs. other cases, 100% for the negative emotions (i.e. anger vs. sadness), and 96% for the positive vs. negative emotions

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

ArtELingo: A Million Emotion Annotations of WikiArt with Emphasis on Diversity over Language and Culture

Author: Abdelfattah Mohamed
Alhuwaider Shyma
Church Kenneth Ward
Elhoseiny Mohamed
Li Feifan
Mohamed Youssef
Zhang Xiangliang
Publication venue
Publication date: 19/11/2022
Field of study

This paper introduces ArtELingo, a new benchmark and dataset, designed to encourage work on diversity across languages and cultures. Following ArtEmis, a collection of 80k artworks from WikiArt with 0.45M emotion labels and English-only captions, ArtELingo adds another 0.79M annotations in Arabic and Chinese, plus 4.8K in Spanish to evaluate "cultural-transfer" performance. More than 51K artworks have 5 annotations or more in 3 languages. This diversity makes it possible to study similarities and differences across languages and cultures. Further, we investigate captioning tasks, and find diversity improves the performance of baseline models. ArtELingo is publicly available at https://www.artelingo.org/ with standard splits and baseline models. We hope our work will help ease future research on multilinguality and culturally-aware AI.Comment: 9 pages, Accepted at EMNLP 22, for more details see https://www.artelingo.org

arXiv.org e-Print Archive

Convolutional Neural Network Architectures for Gender, Emotional Detection from Speech and Speaker Diarization

Author: Ben Messaoud Zaineb
Frikha Mondher
Taha Thaer Mufeed
Publication venue: International Federation of Engineering Education Societies (IFEES)
Publication date: 09/02/2024
Field of study

This paper introduces three system architectures for speaker identification that aim to overcome the limitations of diarization and voice-based biometric systems. Diarization systems utilize unsupervised algorithms to segment audio data based on the time boundaries of utterances, but they do not distinguish individual speakers. On the other hand, voice-based biometric systems can only identify individuals in recordings with a single speaker. Identifying speakers in recordings of natural conversations can be challenging, especially when emotional shifts can alter voice characteristics, making gender identification difficult. To address this issue, the proposed architectures include techniques for gender, emotion, and diarization at either the segment or group level. The evaluation of these architectures utilized two speech databases, namely VoxCeleb and RAVDESS (Ryerson audio-visual database of emotional speech and song) datasets. The findings reveal that the proposed approach outperforms the strategy level in terms of recognition results, despite the real-time processing advantage of the latter. The challenge of identifying multiple speakers engaging in a conversation while considering emotional changes that impact speech is effectively addressed by the proposed architectures. The data indicates that the gender and emotion classification of diarization achieves an accuracy of over 98 percent. These results suggest that the proposed speech-based approach can achieve highly accurate speaker identification

Online-Journals.org (International Association of Online Engineering)

Explaining (Sarcastic) Utterances to Enhance Affect Understanding in Multimodal Dialogues

Author: Akhtar Md Shad
Chakraborty Tanmoy
Kumar Shivani
Mondal Ishani
Publication venue
Publication date: 22/11/2022
Field of study

Conversations emerge as the primary media for exchanging ideas and conceptions. From the listener's perspective, identifying various affective qualities, such as sarcasm, humour, and emotions, is paramount for comprehending the true connotation of the emitted utterance. However, one of the major hurdles faced in learning these affect dimensions is the presence of figurative language, viz. irony, metaphor, or sarcasm. We hypothesize that any detection system constituting the exhaustive and explicit presentation of the emitted utterance would improve the overall comprehension of the dialogue. To this end, we explore the task of Sarcasm Explanation in Dialogues, which aims to unfold the hidden irony behind sarcastic utterances. We propose MOSES, a deep neural network, which takes a multimodal (sarcastic) dialogue instance as an input and generates a natural language sentence as its explanation. Subsequently, we leverage the generated explanation for various natural language understanding tasks in a conversational dialogue setup, such as sarcasm detection, humour identification, and emotion recognition. Our evaluation shows that MOSES outperforms the state-of-the-art system for SED by an average of ~2% on different evaluation metrics, such as ROUGE, BLEU, and METEOR. Further, we observe that leveraging the generated explanation advances three downstream tasks for affect classification - an average improvement of ~14% F1-score in the sarcasm detection task and ~2% in the humour identification and emotion recognition task. We also perform extensive analyses to assess the quality of the results.Comment: Accepted at AAAI 2023. 11 Pages; 14 Tables; 3 Figure

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

An ongoing review of speech emotion recognition

Author: De Lope Asiaín Jan Javier
Graña Romay Manuel María
Publication venue: 'Elsevier BV'
Publication date: 01/04/2023
Field of study

User emotional status recognition is becoming a key feature in advanced Human Computer Interfaces (HCI). A key source of emotional information is the spoken expression, which may be part of the interaction between the human and the machine. Speech emotion recognition (SER) is a very active area of research that involves the application of current machine learning and neural networks tools. This ongoing review covers recent and classical approaches to SER reported in the literature.This work has been carried out with the support of project PID2020-116346GB-I00 funded by the Spanish MICIN

Archivo Digital para la Docencia y la Investigación