9 research outputs found
Best Practices for Noise-Based Augmentation to Improve the Performance of Deployable Speech-Based Emotion Recognition Systems
Speech emotion recognition is an important component of any human centered
system. But speech characteristics produced and perceived by a person can be
influenced by a multitude of reasons, both desirable such as emotion, and
undesirable such as noise. To train robust emotion recognition models, we need
a large, yet realistic data distribution, but emotion datasets are often small
and hence are augmented with noise. Often noise augmentation makes one
important assumption, that the prediction label should remain the same in
presence or absence of noise, which is true for automatic speech recognition
but not necessarily true for perception based tasks. In this paper we make
three novel contributions. We validate through crowdsourcing that the presence
of noise does change the annotation label and hence may alter the original
ground truth label. We then show how disregarding this knowledge and assuming
consistency in ground truth labels propagates to downstream evaluation of ML
models, both for performance evaluation and robustness testing. We end the
paper with a set of recommendations for noise augmentations in speech emotion
recognition datasets
Privacy Enhanced Multimodal Neural Representations for Emotion Recognition
Many mobile applications and virtual conversational agents now aim to
recognize and adapt to emotions. To enable this, data are transmitted from
users' devices and stored on central servers. Yet, these data contain sensitive
information that could be used by mobile applications without user's consent
or, maliciously, by an eavesdropping adversary. In this work, we show how
multimodal representations trained for a primary task, here emotion
recognition, can unintentionally leak demographic information, which could
override a selected opt-out option by the user. We analyze how this leakage
differs in representations obtained from textual, acoustic, and multimodal
data. We use an adversarial learning paradigm to unlearn the private
information present in a representation and investigate the effect of varying
the strength of the adversarial component on the primary task and on the
privacy metric, defined here as the inability of an attacker to predict
specific demographic information. We evaluate this paradigm on multiple
datasets and show that we can improve the privacy metric while not
significantly impacting the performance on the primary task. To the best of our
knowledge, this is the first work to analyze how the privacy metric differs
across modalities and how multiple privacy concerns can be tackled while still
maintaining performance on emotion recognition.Comment: 8 page
Implicit Design Choices and Their Impact on Emotion Recognition Model Development and Evaluation
Emotion recognition is a complex task due to the inherent subjectivity in both the perception and production of emotions. The subjectivity of emotions poses significant challenges in developing accurate and robust computational models. This thesis examines critical facets of emotion recognition, beginning with the collection of diverse datasets that account for psychological factors in emotion production. To address these complexities, the thesis makes several key contributions.
To handle the challenge of non-representative training data, this work collects the Multimodal Stressed Emotion dataset, which introduces controlled stressors during data collection to better represent real-world influences on emotion production. To address issues with label subjectivity, this research comprehensively analyzes how data augmentation techniques and annotation schemes impact emotion perception and annotator labels. It further handles natural confounding variables and variations by employing adversarial networks to isolate key factors like stress from learned emotion representations during model training. For tackling concerns about leakage of sensitive demographic variables, this work leverages adversarial learning to strip sensitive demographic information from multimodal encodings. Additionally, it proposes optimized sociological evaluation metrics aligned with cost-effective, real-world needs for model testing.
The findings from this research provide valuable insights into the nuances of emotion labeling, modeling techniques, and interpretation frameworks for robust emotion recognition. The novel datasets collected help encapsulate the environmental and personal variability prevalent in real-world emotion expression. The data augmentation and annotation studies improve label consistency by accounting for subjectivity in emotion perception. The stressor-controlled models enhance adaptability and generalizability across diverse contexts and datasets. The bimodal adversarial networks aid in generating representations that avoid leakage of sensitive user information. Finally, the optimized sociological evaluation metrics reduce reliance on extensive expensive human annotations for model assessment.
This research advances robust, practical emotion recognition through multifaceted studies of challenges in datasets, labels, modeling, demographic and membership variable encoding in representations, and evaluation. The groundwork has been laid for cost-effective, generalizable, and unbiased emotion recognition models.PhDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/192416/1/mimansa_1.pd
Interpreting Multimodal Machine Learning Models Trained for Emotion Recognition to Address Robustness and Privacy Concerns
Many mobile applications and virtual conversational agents now aim to recognize and adapt to emotions. These predicted emotions are used in variety of downstream applications: (a) generating more human like dialogues, (b) predicting mental health issues, and (c) hate speech detection and intervention. To enable this, data are transmitted from users' devices and stored on central servers. These data are then processed further, either annotated or used as inputs for training a model for a specific task. Yet, these data contain sensitive information that could be used by mobile applications without user's consent or, maliciously, by an eavesdropping adversary. My work focuses on two major issues that are faced while training emotion recognition algorithms: (a) privacy of the generated representations and, (b) explaining and ensuring that the predictions are robust to various situations. Tackling these issues would lead to emotion based algorithms that are deployable and helpful at a larger scale, thus enabling more human like experience when interacting with AI