Search CORE

20 research outputs found

Child and youth affective computing - challenge accepted

Author: Lochner Johanna
Schuller Bjorn W.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

Affective computing has been shown effective and useful in a range of use cases by now, including human–computer interaction, emotionally intelligent tutoring, or depression monitoring. While these could be very useful to the younger among us—including in particular also earlier recognition of developmental disorders, usually research and even working demonstrators have been largely targeting an adult population. Only a few studies, including the first-ever competitive emotion challenge, were based on children’s data. In times where fairness is a dominating topic in the world of artificial intelligence, it seems timely to widen up to include children and youth more broadly as a user group and beneficiaries of the promises affective computing holds. To best support according to algorithmic and technological development, here, we summarize the emotional development of this group over the years, which poses considerable challenges for automatic emotion recognition, generation, and processing engines. We also provide a view on the steps to be taken to best cope with these, including drifting target learning, broadening up on the “vocabulary” of affective states modeled, transfer, few-shot, zero-shot, reinforced, and life-long learning in affective computing besides trustability

OPUS Augsburg

Recent Advances in Computer Audition for Diagnosing COVID-19: An Overview

Author: Qian Kun
Schuller Bjorn W.
Yamamoto Yoshiharu
Publication venue
Publication date: 08/12/2020
Field of study

Computer audition (CA) has been demonstrated to be efficient in healthcare domains for speech-affecting disorders (e.g., autism spectrum, depression, or Parkinson's disease) and body sound-affecting abnormalities (e. g., abnormal bowel sounds, heart murmurs, or snore sounds). Nevertheless, CA has been underestimated in the considered data-driven technologies for fighting the COVID-19 pandemic caused by the SARS-CoV-2 coronavirus. In this light, summarise the most recent advances in CA for COVID-19 speech and/or sound analysis. While the milestones achieved are encouraging, there are yet not any solid conclusions that can be made. This comes mostly, as data is still sparse, often not sufficiently validated and lacking in systematic comparison with related diseases that affect the respiratory system. In particular, CA-based methods cannot be a standalone screening tool for SARS-CoV-2. We hope this brief overview can provide a good guidance and attract more attention from a broader artificial intelligence community.Comment: 2 page

arXiv.org e-Print Archive

OPUS Augsburg

Embracing and exploiting annotator emotional subjectivity: an affective rater ensemble model

Author: Batliner Anton
Schuller Bjorn W.
Schumann Lea
Stappen Lukas
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

Automated recognition of continuous emotions in audio-visual data is a growing area of study that aids in understanding human-machine interaction. Training such systems presupposes human annotation of the data. The annotation process, however, is laborious and expensive given that several human ratings are required for every data sample to compensate for the subjectivity of emotion perception. As a consequence, labelled data for emotion recognition are rare and the existing corpora are limited when compared to other state-of-the-art deep learning datasets. In this study, we explore different ways in which existing emotion annotations can be utilised more effectively to exploit available labelled information to the fullest. To reach this objective, we exploit individual raters’ opinions by employing an ensemble of rater-specific models, one for each annotator, by that reducing the loss of information which is a byproduct of annotation aggregation; we find that individual models can indeed infer subjective opinions. Furthermore, we explore the fusion of such ensemble predictions using different fusion techniques. Our ensemble model with only two annotators outperforms the regular Arousal baseline on the test set of the MuSe-CaR corpus. While no considerable improvements on valence could be obtained, using all annotators increases the prediction performance of arousal by up to. 07 Concordance Correlation Coefficient absolute improvement on test - solely trained on rate-specific models and fused by an attention-enhanced Long-short Term Memory-Recurrent Neural Network

OPUS Augsburg

Domain Adapting Deep Reinforcement Learning for Real-world Speech Emotion Recognition

Author: Khalifa Sara
Rajapakshe Thejan
Rana Rajib
Schuller Bjorn W.
Publication venue
Publication date: 23/09/2022
Field of study

Computers can understand and then engage with people in an emotionally intelligent way thanks to speech-emotion recognition (SER). However, the performance of SER in cross-corpus and real-world live data feed scenarios can be significantly improved. The inability to adapt an existing model to a new domain is one of the shortcomings of SER methods. To address this challenge, researchers have developed domain adaptation techniques that transfer knowledge learnt by a model across the domain. Although existing domain adaptation techniques have improved performances across domains, they can be improved to adapt to a real-world live data feed situation where a model can self-tune while deployed. In this paper, we present a deep reinforcement learning-based strategy (RL-DA) for adapting a pre-trained model to a real-world live data feed setting while interacting with the environment and collecting continual feedback. RL-DA is evaluated on SER tasks, including cross-corpus and cross-language domain adaption schema. Evaluation results show that in a live data feed setting, RL-DA outperforms a baseline strategy by 11% and 14% in cross-corpus and cross-language scenarios, respectively

arXiv.org e-Print Archive

Refashioning Emotion Recognition Modelling: The Advent of Generalised Large Models

Author: Han Jing
Pang Tao
Peng Liyizhe
Schuller Bjorn W.
Zhang Zixing
Zhao Huan
Publication venue
Publication date: 21/08/2023
Field of study

After the inception of emotion recognition or affective computing, it has increasingly become an active research topic due to its broad applications. Over the past couple of decades, emotion recognition models have gradually migrated from statistically shallow models to neural network-based deep models, which can significantly boost the performance of emotion recognition models and consistently achieve the best results on different benchmarks. Therefore, in recent years, deep models have always been considered the first option for emotion recognition. However, the debut of large language models (LLMs), such as ChatGPT, has remarkably astonished the world due to their emerged capabilities of zero/few-shot learning, in-context learning, chain-of-thought, and others that are never shown in previous deep models. In the present paper, we comprehensively investigate how the LLMs perform in emotion recognition in terms of diverse aspects, including in-context learning, few-short learning, accuracy, generalisation, and explanation. Moreover, we offer some insights and pose other potential challenges, hoping to ignite broader discussions about enhancing emotion recognition in the new era of advanced and generalised large models

arXiv.org e-Print Archive

The influence of pleasant and unpleasant odours on the acoustics of speech

Author: Amiriparian Shahin
Batliner Anton
Gerczuk Maurice
Heyne Franziska
Hummel Thomas
Klockow Marie
Schuller Bjorn W.
Triantafyllopoulos Andreas
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

Olfaction, i. e., the sense of smell is referred to as the ‘emotional sense’, as it has been shown to elicit affective responses. Yet, its influence on speech production has not been investigated. In this paper, we introduce a novel speech-based smell recognition approach, drawing from the fields of speech emotion recognition and personalised machine learning. In particular, we collected a corpus of 40 female speakers reading 2 short stories while either no scent, unpleasant odour (fish), or pleasant odour (peach) is applied through a nose clip. Further, we present a machine learning pipeline for the extraction of data representations, model training, and personalisation of the trained models. In a leave-one-speaker-out cross-validation, our best models trained on state-of-the-art wav2vec features achieve a classification rate of 68 % when distinguishing between speech produced under the influence of negative scent and no applied scent. In addition, we highlight the importance of personalisation approaches, showing that a speaker-based feature normalisation substantially improves performance across the evaluated experiments. In summary, the presented results indicate that odours have a weak, but measurable effect on the acoustics of speech

OPUS Augsburg

Holistic Affect Recognition Using PaNDA: Paralinguistic Non-metric Dimensional Analysis

Author: Bjorn Schuller
Picard Rosalind W.
Weninger Felix
Zhang Yue
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2019
Field of study

Humans perceive emotion from each other using a holistic perspective, accounting for diverse personal, non-emotional variables that shape expression. In contrast, today's algorithms are mainly designed to recognize emotion in isolation. In this work, we propose a multi-task learning approach to jointly learn the recognition of affective states from speech along with various speaker attributes. A problem with multi-task learning is that sometimes inductive transfer can negatively impact performance. To mitigate negative transfer, we introduce the Paralinguistic Non-metric Dimensional Analysis (PaNDA) method that systematically measures task relatedness and also enables visualizing the topology of affective phenomena as a whole. In addition, we present a generic framework that conflates the concepts of single-task and multi-task learning. Using this framework, we construct two models that demonstrate holistic affect recognition: one treats all tasks as equally related, whereas the other one incorporates the task correlations between a main task and its supporting tasks obtained from PaNDA. Both models employ a multi-task deep neural network, in which separate output layers are used to predict discrete and continuous attributes, while hidden layers are shared across different tasks. On average across 18 classification and regression tasks, the weighted multi-task learning with PaNDA significantly improves performance compared to single-task and unweighted multi-task learning.E

DSpace@MIT

A Deep Matrix Factorization Method for Learning Attribute Representations

Author: Bjorn W. Schuller
George Trigeorgis
Konstantinos Bousmalis
Stefanos Zafeiriou
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

A Novel Policy for Pre-trained Deep Reinforcement Learning for Speech Emotion Recognition

Author: Khalifa Sara
Liu Jiajun
Rajapakshe Thejan
Rana Rajib
Schuller Bjorn W
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 31/01/2021
Field of study

Deep Reinforcement Learning (deep RL) has gained tremendous success in gaming but it has rarely been explored for Speech Emotion Recognition (SER). In the RL literature, policy used by the RL agent plays a major role in action selection, however, there is no RL policy tailored for SER. Also, an extended learning period is a general challenge for deep RL, which can impact the speed of learning for SER. In this paper, we introduce a novel policy, the 'Zeta policy' tailored for SER and introduce pre-training in deep RL to achieve a faster learning rate. Pre-training with a cross dataset was also studied to discover the feasibility of pre-training the RL agent with a similar dataset in a scenario where real environmental data is not available. We use 'IEMOCAP' and 'SAVEE' datasets for the evaluation with the problem of recognising four emotions, namely happy, sad, angry, and neutral. The experimental results show that the proposed policy performs better than existing policies. Results also support that pre-training can reduce training time and is robust to a cross-corpus scenario

arXiv.org e-Print Archive

OPUS Augsburg

University of Southern Queensland ePrints