Search CORE

59 research outputs found

Controlling for Confounders in Multimodal Emotion Classification via Adversarial Learning

Author: Busso Carlos
Pennebaker W
Rosenberg Andrew
Shinohara Yusuke
Tieleman T.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 23/08/2019
Field of study

Various psychological factors affect how individuals express emotions. Yet, when we collect data intended for use in building emotion recognition systems, we often try to do so by creating paradigms that are designed just with a focus on eliciting emotional behavior. Algorithms trained with these types of data are unlikely to function outside of controlled environments because our emotions naturally change as a function of these other factors. In this work, we study how the multimodal expressions of emotion change when an individual is under varying levels of stress. We hypothesize that stress produces modulations that can hide the true underlying emotions of individuals and that we can make emotion recognition algorithms more generalizable by controlling for variations in stress. To this end, we use adversarial networks to decorrelate stress modulations from emotion representations. We study how stress alters acoustic and lexical emotional predictions, paying special attention to how modulations due to stress affect the transferability of learned emotion recognition models across domains. Our results show that stress is indeed encoded in trained emotion classifiers and that this encoding varies across levels of emotions and across the lexical and acoustic modalities. Our results also show that emotion recognition models that control for stress during training have better generalizability when applied to new domains, compared to models that do not control for stress during training. We conclude that is is necessary to consider the effect of extraneous psychological factors when building and testing emotion recognition models.Comment: 10 pages, ICMI 201

arXiv.org e-Print Archive

Crossref

An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era

Author: André Elisabeth
Fu Ruibo
He Xiangheng
İymen Gökçe
Liu Shuo
Mertes Silvan
Schuller Björn W.
Sezgin Metin
Tao Jianhua
Triantafyllopoulos Andreas
Tzirakis Panagiotis
Yang Zijiang
Publication venue
Publication date: 06/10/2022
Field of study

Speech is the fundamental mode of human communication, and its synthesis has long been a core priority in human-computer interaction research. In recent years, machines have managed to master the art of generating speech that is understandable by humans. But the linguistic content of an utterance encompasses only a part of its meaning. Affect, or expressivity, has the capacity to turn speech into a medium capable of conveying intimate thoughts, feelings, and emotions -- aspects that are essential for engaging and naturalistic interpersonal communication. While the goal of imparting expressivity to synthesised utterances has so far remained elusive, following recent advances in text-to-speech synthesis, a paradigm shift is well under way in the fields of affective speech synthesis and conversion as well. Deep learning, as the technology which underlies most of the recent advances in artificial intelligence, is spearheading these efforts. In the present overview, we outline ongoing trends and summarise state-of-the-art approaches in an attempt to provide a comprehensive overview of this exciting field.Comment: Submitted to the Proceedings of IEE

arXiv.org e-Print Archive

OPUS Augsburg

Survey of Social Bias in Vision-Language Models

Author: Bang Yejin
Cahyawijaya Samuel
Dai Wenliang
Fung Pascale
Lee Nayeon
Lovenia Holy
Publication venue
Publication date: 24/09/2023
Field of study

In recent years, the rapid advancement of machine learning (ML) models, particularly transformer-based pre-trained models, has revolutionized Natural Language Processing (NLP) and Computer Vision (CV) fields. However, researchers have discovered that these models can inadvertently capture and reinforce social biases present in their training datasets, leading to potential social harms, such as uneven resource allocation and unfair representation of specific social groups. Addressing these biases and ensuring fairness in artificial intelligence (AI) systems has become a critical concern in the ML community. The recent introduction of pre-trained vision-and-language (VL) models in the emerging multimodal field demands attention to the potential social biases present in these models as well. Although VL models are susceptible to social bias, there is a limited understanding compared to the extensive discussions on bias in NLP and CV. This survey aims to provide researchers with a high-level insight into the similarities and differences of social bias studies in pre-trained models across NLP, CV, and VL. By examining these perspectives, the survey aims to offer valuable guidelines on how to approach and mitigate social bias in both unimodal and multimodal settings. The findings and recommendations presented here can benefit the ML community, fostering the development of fairer and non-biased AI models in various applications and research endeavors

arXiv.org e-Print Archive

Toward Data-Driven Digital Therapeutics Analytics: Literature Review and Research Directions

Author: Alikhanov Jumabek
Jung Gyuwon
Kim Heepyung
Kim Heeyoung
Kim Jin San
Lee Uichin
Ma Eun-Yeol
Noh Youngtae
Publication venue
Publication date: 18/09/2022
Field of study

With the advent of Digital Therapeutics (DTx), the development of software as a medical device (SaMD) for mobile and wearable devices has gained significant attention in recent years. Existing DTx evaluations, such as randomized clinical trials, mostly focus on verifying the effectiveness of DTx products. To acquire a deeper understanding of DTx engagement and behavioral adherence, beyond efficacy, a large amount of contextual and interaction data from mobile and wearable devices during field deployment would be required for analysis. In this work, the overall flow of the data-driven DTx analytics is reviewed to help researchers and practitioners to explore DTx datasets, to investigate contextual patterns associated with DTx usage, and to establish the (causal) relationship of DTx engagement and behavioral adherence. This review of the key components of data-driven analytics provides novel research directions in the analysis of mobile sensor and interaction datasets, which helps to iteratively improve the receptivity of existing DTx.Comment: This paper has been accepted by the IEEE/CAA Journal of Automatica Sinic

arXiv.org e-Print Archive

Recommended from our members

Enhancing the Generalization of Convolutional Neural Networks for Speech Emotion Recognition

Author: Guizzo E.
Publication venue
Publication date
Field of study

Human-machine interaction is rapidly gaining significance in our daily lives. While speech recognition has achieved near-human performance in recent years, the intricate details embedded in speech extend beyond the mere arrangement of words. Speech Emotion Recognition (SER) is therefore acquiring a growing role in this field by decoding not only the linguistic content but also the emotional nuances of human spoken communication and enabling therefore a more exhaustive comprehension of the information conveyed by speech signals. Despite the success that neural networks have already achieved in this task, SER is still challenging due to the variability of emotional expression, especially in real-world scenarios where generalization to unseen speakers and contexts is required. In addition, the high resource demand of SER models, combined with the scarcity of emotion-labelled data, hinder the development and application of effective solutions in this field. In this thesis, we present multiple approaches to overcome the aforementioned difficulties. We first introduce a multiple-time-scale (MTS) convolutional neural network architecture to create flexibility towards temporal variations when analyzing time-frequency representations of audio data. We show that resilience to speed fluctuations is relevant in SER tasks, since emotion is expressed through complex spectral patterns that can exhibit significant local dilation and compression on the time axis depending on speaker and context. The results indicate that the use of MTS consistently improves the generalization of networks of different capacity and depth, compared to standard convolution. In a second stage, we propose a more general approach to discourage unwanted sensitivity towards specific target properties in CNNs, introducing the novel concept of anti-transfer learning. While transfer learning assumes that the learning process for a target task will benefit from re-using representations learned for another task, anti-transfer avoids the learning of representations that have been learned for an orthogonal task, i.e., one that is not relevant and potentially confounding for the target task, such as speaker identity and speech content for emotion recognition. In anti-transfer learning we penalize similarity between activations of a network being trained and another network previously trained on an orthogonal task. This leads to better generalization and provides a degree of control over correlations that are spurious or undesirable. We show that anti-transfer actually leads to the intended invariance to the orthogonal task and to more appropriate feature maps for the target task at hand. Anti-transfer creates a computation and memory cost at training time, but it enables enables the reuse of pre-trained models. In order to avoid the high resource demand of SER models in general and anti-transfer learning specifically, we propose RH-emo, a novel semisupervised architecture aimed at extracting quaternion embeddings from realvalued monoaural spectrograms, enabling the use of quaternion-valued networks for SER tasks. RH-emo is a hybrid real/quaternion autoencoder network that consists of a real-valued encoder in parallel to a real-valued emotion classifier and a quaternion-valued decoder. We show that the use of RHemo, combined with quaternion convolutional neural networks provides a consistent improvement in SER tasks, while requiring far fewer trainable parameters and therefore substantially reducing the resource demand of SER models. Finally, we apply anti-transfer learning to quaternion-valued neural networks fed with RH-emo embeddings. We demonstrate that the combination of the two approaches maintains the disentanglement properties of antitransfer, while using a reduced amount of memory, computation, and training time, making this a suitable approach for SER scenarios with limited resources and where context and speaker independence are needed

City Research Online

Deep Learning Techniques for Electroencephalography Analysis

Author: Zoumpourlis G
Publication venue
Publication date: 08/02/2024
Field of study

In this thesis we design deep learning techniques for training deep neural networks on electroencephalography (EEG) data and in particular on two problems, namely EEG-based motor imagery decoding and EEG-based affect recognition, addressing challenges associated with them. Regarding the problem of motor imagery (MI) decoding, we first consider the various kinds of domain shifts in the EEG signals, caused by inter-individual differences (e.g. brain anatomy, personality and cognitive profile). These domain shifts render multi-subject training a challenging task and impede robust cross-subject generalization. We build a two-stage model ensemble architecture and propose two objectives to train it, combining the strengths of curriculum learning and collaborative training. Our subject-independent experiments on the large datasets of Physionet and OpenBMI, verify the effectiveness of our approach. Next, we explore the utilization of the spatial covariance of EEG signals through alignment techniques, with the goal of learning domain-invariant representations. We introduce a Riemannian framework that concurrently performs covariance-based signal alignment and data augmentation, while training a convolutional neural network (CNN) on EEG time-series. Experiments on the BCI IV-2a dataset show that our method performs superiorly over traditional alignment, by inducing regularization to the weights of the CNN. We also study the problem of EEG-based affect recognition, inspired by works suggesting that emotions can be expressed in relative terms, i.e. through ordinal comparisons between different affective state levels. We propose treating data samples in a pairwise manner to infer the ordinal relation between their corresponding affective state labels, as an auxiliary training objective. We incorporate our objective in a deep network architecture which we jointly train on the tasks of sample-wise classification and pairwise ordinal ranking. We evaluate our method on the affective datasets of DEAP and SEED and obtain performance improvements over deep networks trained without the additional ranking objective

Queen Mary Research Online

Gaining Insight into Determinants of Physical Activity using Bayesian Network Learning

Author: Bemelmans R.
Bolman C.
Cao L.
Hommersom A.J.
Lechner L.
Tummers S.
Publication venue: 'Leiden University Library - OAPEN'
Publication date: 01/01/2020
Field of study

Contains fulltext : 228326pre.pdf (preprint version ) (Open Access) Contains fulltext : 228326pub.pdf (publisher's version ) (Open Access)BNAIC/BeneLearn 202

Open University of the Netherlands Research Portal

Radboud Repository

A model a day keeps the doctor away:Reinforcement learning for personalized healthcare

Author: el Hassouni Ali
Publication venue
Publication date: 18/01/2022
Field of study

VU Research Portal

Proceedings of the 2021 Joint Workshop of Fraunhofer IOSB and Institute for Anthropomatics, Vision and Fusion Laboratory

Author
Publication venue: KIT Scientific Publishing
Publication date: 11/07/2022
Field of study

2021, the annual joint workshop of the Fraunhofer IOSB and KIT IES was hosted at the IOSB in Karlsruhe. For a week from the 2nd to the 6th July the doctoral students extensive reports on the status of their research. The results and ideas presented at the workshop are collected in this book in the form of detailed technical reports

Directory of Open Access Books (DOAB)

Proceedings of the 2021 Joint Workshop of Fraunhofer IOSB and Institute for Anthropomatics, Vision and Fusion Laboratory

Author: Beyerer Jürgen
Zander Tim
Publication venue: KIT Scientific Publishing, Karlsruhe
Publication date: 01/01/2022
Field of study

KITopen