Search CORE

397 research outputs found

Average Jane, where art thou? - Recent avenues in efficient machine learning under subjectivity uncertainty

Author: A Esteva
AS Cowen
B Bakhtiari
BW Schuller
F Eyben
FM Deutsch
G Patterson
J Zhang
KW McCluskey
MA Nicolaou
MA Nicolaou
O Russakovsky
P Morales-Álvarez
PG Ipeirotis
Q Hu
R Gupta
S Liu
S Mariooryad
T-Y Lin
V Vapnik
VC Raykar
Y Kwon
Y Li
Y Liu
Z Shu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

OPUS Augsburg

Crossref

Implicit fusion by joint audiovisual training for emotion recognition in mono modality

Author: Han Jing
Ren Zhao
Schuller Björn
Zhang Zixing
Publication venue
Publication date: 01/01/2019
Field of study

A paper in ICASSP 201

OPUS Augsburg

Crossref

ZENODO

Simple Model Also Works: A Novel Emotion Recognition Network in Textual Conversation Based on Curriculum Learning Strategy

Author: Li Jiang
Liu Yingjian
Wang Xiaoping
Zeng Zhigang
Zhou Qing
Publication venue
Publication date: 11/08/2023
Field of study

Emotion Recognition in Conversation (ERC) has emerged as a research hotspot in domains such as conversational robots and question-answer systems. How to efficiently and adequately retrieve contextual emotional cues has been one of the key challenges in the ERC task. Existing efforts do not fully model the context and employ complex network structures, resulting in excessive computational resource overhead without substantial performance improvement. In this paper, we propose a novel Emotion Recognition Network based on Curriculum Learning strategy (ERNetCL). The proposed ERNetCL primarily consists of Temporal Encoder (TE), Spatial Encoder (SE), and Curriculum Learning (CL) loss. We utilize TE and SE to combine the strengths of previous methods in a simplistic manner to efficiently capture temporal and spatial contextual information in the conversation. To simulate the way humans learn curriculum from easy to hard, we apply the idea of CL to the ERC task to progressively optimize the network parameters of ERNetCL. At the beginning of training, we assign lower learning weights to difficult samples. As the epoch increases, the learning weights for these samples are gradually raised. Extensive experiments on four datasets exhibit that our proposed method is effective and dramatically beats other baseline models.Comment: 12 pages,9 figure

arXiv.org e-Print Archive

Knowledge Transfer For On-Device Speech Emotion Recognition with Neural Structured Learning

Author: Chang Yi
Nguyen Thanh Tam
Qian Kun
Ren Zhao
Schuller Björn W.
Publication venue
Publication date: 26/10/2022
Field of study

Speech emotion recognition (SER) has been a popular research topic in human-computer interaction (HCI). As edge devices are rapidly springing up, applying SER to edge devices is promising for a huge number of HCI applications. Although deep learning has been investigated to improve the performance of SER by training complex models, the memory space and computational capability of edge devices represents a constraint for embedding deep learning models. We propose a neural structured learning (NSL) framework through building synthesized graphs. An SER model is trained on a source dataset and used to build graphs on a target dataset. A lightweight model is then trained with the speech samples and graphs together as the input. Our experiments demonstrate that training a lightweight SER model on the target dataset with speech samples and graphs can not only produce small SER models, but also enhance the model performance over models with speech samples only.Comment: Submitted to ICASSP 202

arXiv.org e-Print Archive

AVEC 2019 workshop and challenge: state-of-mind, detecting depression with AI, and cross-cultural affect recognition

Author: Alisamir Sina
Amiriparian Shahin
Cowie Roddy
Cummins Nicholas
Liu Shuo
Mallol-Ragolta Adria
Messner Eva-Maria
Pantic Maja
Ren Zhao
Ringeval Fabien
Schmitt Maximilian
Schuller Björn
Soleymani Mohammad
Song Siyang
Tavabi Leili
Valstar Michel
Zhao Ziping
Publication venue
Publication date: 01/01/2019
Field of study

The Audio/Visual Emotion Challenge and Workshop (AVEC 2019) "State-of-Mind, Detecting Depression with AI, and Cross-cultural Affect Recognition" is the ninth competition event aimed at the comparison of multimedia processing and machine learning methods for automatic audiovisual health and emotion analysis, with all participants competing strictly under the same conditions. The goal of the Challenge is to provide a common benchmark test set for multimodal information processing and to bring together the health and emotion recognition communities, as well as the audiovisual processing communities, to compare the relative merits of various approaches to health and emotion recognition from real-life data. This paper presents the major novelties introduced this year, the challenge guidelines, the data used, and the performance of the baseline systems on the three proposed tasks: state-of-mind recognition, depression assessment with AI, and cross-cultural affect sensing, respectively

arXiv.org e-Print Archive

OPUS Augsburg

Crossref

ZENODO

University of Twente Research Information

Comprehensive Study of Automatic Speech Emotion Recognition Systems

Author: Jagtap Sonal
Kawade Rupali
Publication venue: Auricle Global Society of Education and Research
Publication date: 31/08/2023
Field of study

Speech emotion recognition (SER) is the technology that recognizes psychological characteristics and feelings from the speech signals through techniques and methodologies. SER is challenging because of more considerable variations in different languages arousal and valence levels. Various technical developments in artificial intelligence and signal processing methods have encouraged and made it possible to interpret emotions.SER plays a vital role in remote communication. This paper offers a recent survey of SER using machine learning (ML) and deep learning (DL)-based techniques. It focuses on the various feature representation and classification techniques used for SER. Further, it describes details about databases and evaluation metrics used for speech emotion recognition

International Journal on Recent and Innovation Trends in Computing and Communication

LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech

Author: Alisamir Sina
Allauzen Alexandre
Besacier Laurent
Boito Marcely Zanon
Dinarelli Marco
Esteve Yannick
Evain Solene
Le Hang
Lecouteux Benjamin
Mdhaffar Salima
Nguyen Ha
Parcollet Titouan
Portet Francois
Ringeval Fabien
Rossato Solange
Schwab Didier
Tomashenko Natalia
Tong Ziyi
Publication venue
Publication date: 10/06/2021
Field of study

Self-Supervised Learning (SSL) using huge unlabeled data has been successfully explored for image and natural language processing. Recent works also investigated SSL from speech. They were notably successful to improve performance on downstream tasks such as automatic speech recognition (ASR). While these works suggest it is possible to reduce dependence on labeled data for building efficient speech systems, their evaluation was mostly made on ASR and using multiple and heterogeneous experimental settings (most of them for English). This questions the objective comparison of SSL approaches and the evaluation of their impact on building speech systems. In this paper, we propose LeBenchmark: a reproducible framework for assessing SSL from speech. It not only includes ASR (high and low resource) tasks but also spoken language understanding, speech translation and emotion recognition. We also focus on speech technologies in a language different than English: French. SSL models of different sizes are trained from carefully sourced and documented datasets. Experiments show that SSL is beneficial for most but not all tasks which confirms the need for exhaustive and reliable benchmarks to evaluate its real impact. LeBenchmark is shared with the scientific community for reproducible research in SSL from speech.Comment: Will be presented at Interspeech 202

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

Hal-Diderot