9 research outputs found

    Best Practices for Noise-Based Augmentation to Improve the Performance of Deployable Speech-Based Emotion Recognition Systems

    Full text link
    Speech emotion recognition is an important component of any human centered system. But speech characteristics produced and perceived by a person can be influenced by a multitude of reasons, both desirable such as emotion, and undesirable such as noise. To train robust emotion recognition models, we need a large, yet realistic data distribution, but emotion datasets are often small and hence are augmented with noise. Often noise augmentation makes one important assumption, that the prediction label should remain the same in presence or absence of noise, which is true for automatic speech recognition but not necessarily true for perception based tasks. In this paper we make three novel contributions. We validate through crowdsourcing that the presence of noise does change the annotation label and hence may alter the original ground truth label. We then show how disregarding this knowledge and assuming consistency in ground truth labels propagates to downstream evaluation of ML models, both for performance evaluation and robustness testing. We end the paper with a set of recommendations for noise augmentations in speech emotion recognition datasets

    Privacy Enhanced Multimodal Neural Representations for Emotion Recognition

    Full text link
    Many mobile applications and virtual conversational agents now aim to recognize and adapt to emotions. To enable this, data are transmitted from users' devices and stored on central servers. Yet, these data contain sensitive information that could be used by mobile applications without user's consent or, maliciously, by an eavesdropping adversary. In this work, we show how multimodal representations trained for a primary task, here emotion recognition, can unintentionally leak demographic information, which could override a selected opt-out option by the user. We analyze how this leakage differs in representations obtained from textual, acoustic, and multimodal data. We use an adversarial learning paradigm to unlearn the private information present in a representation and investigate the effect of varying the strength of the adversarial component on the primary task and on the privacy metric, defined here as the inability of an attacker to predict specific demographic information. We evaluate this paradigm on multiple datasets and show that we can improve the privacy metric while not significantly impacting the performance on the primary task. To the best of our knowledge, this is the first work to analyze how the privacy metric differs across modalities and how multiple privacy concerns can be tackled while still maintaining performance on emotion recognition.Comment: 8 page

    Implicit Design Choices and Their Impact on Emotion Recognition Model Development and Evaluation

    Full text link
    Emotion recognition is a complex task due to the inherent subjectivity in both the perception and production of emotions. The subjectivity of emotions poses significant challenges in developing accurate and robust computational models. This thesis examines critical facets of emotion recognition, beginning with the collection of diverse datasets that account for psychological factors in emotion production. To address these complexities, the thesis makes several key contributions. To handle the challenge of non-representative training data, this work collects the Multimodal Stressed Emotion dataset, which introduces controlled stressors during data collection to better represent real-world influences on emotion production. To address issues with label subjectivity, this research comprehensively analyzes how data augmentation techniques and annotation schemes impact emotion perception and annotator labels. It further handles natural confounding variables and variations by employing adversarial networks to isolate key factors like stress from learned emotion representations during model training. For tackling concerns about leakage of sensitive demographic variables, this work leverages adversarial learning to strip sensitive demographic information from multimodal encodings. Additionally, it proposes optimized sociological evaluation metrics aligned with cost-effective, real-world needs for model testing. The findings from this research provide valuable insights into the nuances of emotion labeling, modeling techniques, and interpretation frameworks for robust emotion recognition. The novel datasets collected help encapsulate the environmental and personal variability prevalent in real-world emotion expression. The data augmentation and annotation studies improve label consistency by accounting for subjectivity in emotion perception. The stressor-controlled models enhance adaptability and generalizability across diverse contexts and datasets. The bimodal adversarial networks aid in generating representations that avoid leakage of sensitive user information. Finally, the optimized sociological evaluation metrics reduce reliance on extensive expensive human annotations for model assessment. This research advances robust, practical emotion recognition through multifaceted studies of challenges in datasets, labels, modeling, demographic and membership variable encoding in representations, and evaluation. The groundwork has been laid for cost-effective, generalizable, and unbiased emotion recognition models.PhDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/192416/1/mimansa_1.pd

    Interpreting Multimodal Machine Learning Models Trained for Emotion Recognition to Address Robustness and Privacy Concerns

    No full text
    Many mobile applications and virtual conversational agents now aim to recognize and adapt to emotions. These predicted emotions are used in variety of downstream applications: (a) generating more human like dialogues, (b) predicting mental health issues, and (c) hate speech detection and intervention. To enable this, data are transmitted from users' devices and stored on central servers. These data are then processed further, either annotated or used as inputs for training a model for a specific task. Yet, these data contain sensitive information that could be used by mobile applications without user's consent or, maliciously, by an eavesdropping adversary. My work focuses on two major issues that are faced while training emotion recognition algorithms: (a) privacy of the generated representations and, (b) explaining and ensuring that the predictions are robust to various situations. Tackling these issues would lead to emotion based algorithms that are deployable and helpful at a larger scale, thus enabling more human like experience when interacting with AI
    corecore