8 research outputs found

    "Of all things the measure is man" - automatic classification of emotions and inter-labeler consistency

    Get PDF

    Long Term Suboxone™ Emotional Reactivity As Measured by Automatic Detection in Speech

    Get PDF
    Addictions to illicit drugs are among the nation’s most critical public health and societal problems. The current opioid prescription epidemic and the need for buprenorphine/naloxone (Suboxone®; SUBX) as an opioid maintenance substance, and its growing street diversion provided impetus to determine affective states (“true ground emotionality”) in long-term SUBX patients. Toward the goal of effective monitoring, we utilized emotion-detection in speech as a measure of “true” emotionality in 36 SUBX patients compared to 44 individuals from the general population (GP) and 33 members of Alcoholics Anonymous (AA). Other less objective studies have investigated emotional reactivity of heroin, methadone and opioid abstinent patients. These studies indicate that current opioid users have abnormal emotional experience, characterized by heightened response to unpleasant stimuli and blunted response to pleasant stimuli. However, this is the first study to our knowledge to evaluate “true ground” emotionality in long-term buprenorphine/naloxone combination (Suboxone™). We found in long-term SUBX patients a significantly flat affect (p<0.01), and they had less self-awareness of being happy, sad, and anxious compared to both the GP and AA groups. We caution definitive interpretation of these seemingly important results until we compare the emotional reactivity of an opioid abstinent control using automatic detection in speech. These findings encourage continued research strategies in SUBX patients to target the specific brain regions responsible for relapse prevention of opioid addiction.United States. Dept. of Defense. Assistant Secretary of Defense for Research & Engineering (Air Force Contract FA8721-05-C-0002

    Classifying music by their emotional content by using machine learning

    Get PDF
    Projecte realitzat en el marc d’un programa de mobilitat amb la Haute Ecole d'Ingénierie et Gestion du Canton du Vaud[ANGLÈS] The aim of this project is to relate emotions in music with a set of features from information's theory field (i.e., features that measure disorder and complexity in signals). With this purpose we did a first study with an emotionally classified non-musical database from a different project from Vaud University Hospital (CHUV). We found that the features were useful to create clusters of similar sounds but we could not found a relation with the emotions. Due to the characteristics of the database (e.g., strong connotation in the sounds) and the non-musical characteristic we did not take these results as conclusive. For that reason we built a new database with music sounds and we invited people to provide ratings for the sounds, via a web page. The participants characterized each sound with three values corresponding to three emotional components (Happy-Sad, Relaxing-Stressful and Sleepy-Energetic). By using machine learning methods, concretely artificial neural networks and Kohonen maps, we concluded that some relations exist between the feature values and the emotions.[CASTELLÀ] El objetivo de este proyecto es relacionar las emociones generadas por la música con un conjunto de características propias de Teoría de la información (por ejemplo, entropías) calculadas en las canciones. Con esta finalidad, se realizó un primer estudio sobre una base de datos procedente de otro proyecto procedente del Hospital Universitario de Vaud (CHUV). En este estudio se encontró que dichas características eran útiles para agrupar los sonidos en conjuntos similares pero no se reconoció ninguna relación con las emociones. Debido a ciertos rasgos de la base de datos, se decidió no tomar estos resultados por concluyentes y crear otra base de datos propia con sonidos musicales clasificados emocionalmente. Mediante redes neuronales artificales se llegó a la conclusión de que existían ciertas relaciones entre el valor de las características y las emociones en la música.[CATALÀ] L'objectiu d'aquest projecte és relacionar les emocions generades per la música amb un seguit de característiques pròpies de teoria de la informació (com per exemple entropies). Per a tal finalitat, es realitzà un primer estudi amb una base de dades procedent d'un altre projecte realitzat a l'hospital universitari de Vaud (CHUV). En aquest estudi es constatà que les característiques servien per a definir grups de sons similars però no es va trobar cap relació amb les emocions. Degut a certs trets de la base de dades es decidí no pendre aquests resultats com a conluients i crear una nova base de dades amb sons musicals classificalts emocionalment. Mitjançant xarxes neuronals artificials s'arribà a la conclusió de que existeixen certes relacions entre les característiques empleades i les emocions en la música

    Improving the Generalizability of Speech Emotion Recognition: Methods for Handling Data and Label Variability

    Full text link
    Emotion is an essential component in our interaction with others. It transmits information that helps us interpret the content of what others say. Therefore, detecting emotion from speech is an important step towards enabling machine understanding of human behaviors and intentions. Researchers have demonstrated the potential of emotion recognition in areas such as interactive systems in smart homes and mobile devices, computer games, and computational medical assistants. However, emotion communication is variable: individuals may express emotion in a manner that is uniquely their own; different speech content and environments may shape how emotion is expressed and recorded; individuals may perceive emotional messages differently. Practically, this variability is reflected in both the audio-visual data and the labels used to create speech emotion recognition (SER) systems. SER systems must be robust and generalizable to handle the variability effectively. The focus of this dissertation is on the development of speech emotion recognition systems that handle variability in emotion communications. We break the dissertation into three parts, according to the type of variability we address: (I) in the data, (II) in the labels, and (III) in both the data and the labels. Part I: The first part of this dissertation focuses on handling variability present in data. We approximate variations in environmental properties and expression styles by corpus and gender of the speakers. We find that training on multiple corpora and controlling for the variability in gender and corpus using multi-task learning result in more generalizable models, compared to the traditional single-task models that do not take corpus and gender variability into account. Another source of variability present in the recordings used in SER is the phonetic modulation of acoustics. On the other hand, phonemes also provide information about the emotion expressed in speech content. We discover that we can make more accurate predictions of emotion by explicitly considering both roles of phonemes. Part II: The second part of this dissertation addresses variability present in emotion labels, including the differences between emotion expression and perception, and the variations in emotion perception. We discover that it is beneficial to jointly model both the perception of others and how one perceives one’s own expression, compared to focusing on either one. Further, we show that the variability in emotion perception is a modelable signal and can be captured using probability distributions that describe how groups of evaluators perceive emotional messages. Part III: The last part of this dissertation presents methods that handle variability in both data and labels. We reduce the data variability due to non-emotional factors using deep metric learning and model the variability in emotion perception using soft labels. We propose a family of loss functions and show that by pairing examples that potentially vary in expression styles and lexical content and preserving the real-valued emotional similarity between them, we develop systems that generalize better across datasets and are more robust to over-training. These works demonstrate the importance of considering data and label variability in the creation of robust and generalizable emotion recognition systems. We conclude this dissertation with the following future directions: (1) the development of real-time SER systems; (2) the personalization of general SER systems.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147639/1/didizbq_1.pd

    Automatic Emotion Recognition: Quantifying Dynamics and Structure in Human Behavior.

    Full text link
    Emotion is a central part of human interaction, one that has a huge influence on its overall tone and outcome. Today's human-centered interactive technology can greatly benefit from automatic emotion recognition, as the extracted affective information can be used to measure, transmit, and respond to user needs. However, developing such systems is challenging due to the complexity of emotional expressions and their dynamics in terms of the inherent multimodality between audio and visual expressions, as well as the mixed factors of modulation that arise when a person speaks. To overcome these challenges, this thesis presents data-driven approaches that can quantify the underlying dynamics in audio-visual affective behavior. The first set of studies lay the foundation and central motivation of this thesis. We discover that it is crucial to model complex non-linear interactions between audio and visual emotion expressions, and that dynamic emotion patterns can be used in emotion recognition. Next, the understanding of the complex characteristics of emotion from the first set of studies leads us to examine multiple sources of modulation in audio-visual affective behavior. Specifically, we focus on how speech modulates facial displays of emotion. We develop a framework that uses speech signals which alter the temporal dynamics of individual facial regions to temporally segment and classify facial displays of emotion. Finally, we present methods to discover regions of emotionally salient events in a given audio-visual data. We demonstrate that different modalities, such as the upper face, lower face, and speech, express emotion with different timings and time scales, varying for each emotion type. We further extend this idea into another aspect of human behavior: human action events in videos. We show how transition patterns between events can be used for automatically segmenting and classifying action events. Our experimental results on audio-visual datasets show that the proposed systems not only improve performance, but also provide descriptions of how affective behaviors change over time. We conclude this dissertation with the future directions that will innovate three main research topics: machine adaptation for personalized technology, human-human interaction assistant systems, and human-centered multimedia content analysis.PhDElectrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/133459/1/yelinkim_1.pd

    Deep Active Learning Explored Across Diverse Label Spaces

    Get PDF
    abstract: Deep learning architectures have been widely explored in computer vision and have depicted commendable performance in a variety of applications. A fundamental challenge in training deep networks is the requirement of large amounts of labeled training data. While gathering large quantities of unlabeled data is cheap and easy, annotating the data is an expensive process in terms of time, labor and human expertise. Thus, developing algorithms that minimize the human effort in training deep models is of immense practical importance. Active learning algorithms automatically identify salient and exemplar samples from large amounts of unlabeled data and can augment maximal information to supervised learning models, thereby reducing the human annotation effort in training machine learning models. The goal of this dissertation is to fuse ideas from deep learning and active learning and design novel deep active learning algorithms. The proposed learning methodologies explore diverse label spaces to solve different computer vision applications. Three major contributions have emerged from this work; (i) a deep active framework for multi-class image classication, (ii) a deep active model with and without label correlation for multi-label image classi- cation and (iii) a deep active paradigm for regression. Extensive empirical studies on a variety of multi-class, multi-label and regression vision datasets corroborate the potential of the proposed methods for real-world applications. Additional contributions include: (i) a multimodal emotion database consisting of recordings of facial expressions, body gestures, vocal expressions and physiological signals of actors enacting various emotions, (ii) four multimodal deep belief network models and (iii) an in-depth analysis of the effect of transfer of multimodal emotion features between source and target networks on classification accuracy and training time. These related contributions help comprehend the challenges involved in training deep learning models and motivate the main goal of this dissertation.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201

    Reconnaissance automatique des dimensions affectives dans l'interaction orale homme-machine pour des personnes dépendantes

    Get PDF
    La majorité des systèmes de reconnaissance d'états affectifs est entrainée sur des données artificielles hors contexte applicatif et les évaluations sont effectuées sur des données pré-enregistrées de même qualité. Cette thèse porte sur les différents défis résultant de la confrontation de ces systèmes à des situations et des utilisateurs réels.Pour disposer de données émotionnelles spontanées au plus proche de la réalité, un système de collecte simulant une interaction naturelle et mettant en oeuvre un agent virtuel expressif a été développé. Il a été mis en oeuvre pour recueillir deux corpus émotionnels, avec la participation de près de 80 patients de centres médicaux de la région de Montpellier, dans le cadre du projet ANR ARMEN.Ces données ont été utilisées dans l'exploration d'approches pour la résolution du problème de la généralisation des performances des systèmes de détection des émotions à d'autres données. Dans cette optique, une grande partie des travaux menés a porté sur des stratégies cross-corpus ainsi que la sélection automatique des meilleurs paramètres. Un algorithme hybride combinant des techniques de sélection flottante avec des métriques de similitudes et des heuristiques multi-échelles a été proposé et appliqué notamment dans le cadre d'un challenge (InterSpeech 2012). Les résultats de l'application de cet algorithme offrent des pistes pour différencier des corpus émotionnels à partir des paramètres les plus pertinents pour les représenter.Un prototype du système de dialogue complet, incluant le module de détection des émotions et l'agent virtuel a également été implémenté.Most of the affective states recognition systems are trained on artificial data, without any realistic context. Moreover the evaluations are done with pre-recorded data of the same quality. This thesis seeks to tackle the various challenges resulting from the confrontation of these systems with real situations and users.In order to obtain close-to-reality spontaneous emotional data, a data-collection system simulating a natural interaction was developed. It uses an expressive virtual character to conduct the interaction. Two emotional corpora where gathered with this system, with almost 80 patients from medical centers of the region of Montpellier, France, participating in. This work was carried out as part of the French ANR ARMEN collaborative project.This data was used to explore approaches to solve the problem of performance generalization for emotion detection systems. Most of the work in this part deals with cross-corpus strategies and automatic selection of the best features. An hybrid algorithm combining floating selection techniques with similarity measures and multi-scale heuristics was proposed and used in the frame of the InterSpeech 2012 Emotino Challenge. The results and insights gained with the help of this algorithm suggest ways of distinguishing between emotional corpora using their most relevant features.A prototype of the complete dialog system, including the emotion detection module and the virtual agent was also implemented.PARIS11-SCD-Bib. électronique (914719901) / SudocSudocFranceF

    An evidence-based toolset to capture, measure, analyze & assess emotional health

    Get PDF
    This thesis describes the development and validation of an evidence-based toolkit that captures a patient’s emotional state, expressiveness/affect, self-awareness, and empathy during a fifteen second telephone call, and then accurately measures and analyzes these indicators of Emotional Health based on emotion detection in speech and multilevel regression analysis. An emotion corpus of eight thousand three hundred and seventy-six (8,376) momentary emotional states was collected from one hundred and thirteen (113) participants including three groups: Opioid Addicts undergoing Suboxone® treatment, the General Population, and members of Alcohol Anonymous. Each collected emotional state includes an emotional recording in response to “How are you feeling?” a self-assessment of emotional state, and an assessment of an emotionally-charged recording. Each recording is labeled with the emotional truth. A method for unsupervised emotional truth corpus labeling through automatic audio chunking and unsupervised automatic emotional truth labeling is proposed and experimented. In order to monitor and analyze the emotional health of a patient, algorithms are developed to accurately measure the emotional state of a patient in their natural environment. Real-time emotion detection in speech provides instantaneous classification of the emotional truth of a speech recording. A pseudo real-time method improves emotional truth accuracy as more data becomes available. A new measure of emotional truth accuracy, the certainty score, is introduced. Measures of self-awareness, empathy, and expressiveness are derived from the collected emotional state. Are there differences in emotional truth, self-assessment, self-awareness, and empathy across groups? Does gender have an effect? Does language have an effect? Does length of the response, as an indication of emotional expressiveness, vary with emotion or group? Does confidence of the emotional label, as an indication of affect, vary with emotion or group? Are there differences in call completion rates? Which group would be more likely to continue in data collections? Significant results to these questions will provide evidence that capturing and measuring Emotional Health in speech can: Assist therapists and patients in Cognitive Behavioural Therapy to become aware of symptoms and make it easier to change thoughts and behaviours; Provide evidence of psychotropic medication and psychotherapy effectiveness in mental health and substance abuse treatment programs; Accelerate the interview process during monthly assessments by physicians, psychiatrists, and therapists by providing empirical insight into emotional health of patients in their natural environment. Trigger crisis intervention on conditions including the detection of isolation from unanswered calls, or consecutive days of negative emotions
    corecore