58,439 research outputs found

    Resting-state connectivity reveals a role for sensorimotor systems in vocal emotional processing in children

    Get PDF
    Voices are a primary source of emotional information in everyday interactions. Being able to process non-verbal vocal emotional cues, namely those embedded in speech prosody, impacts on our behaviour and communication. Extant research has delineated the role of temporal and inferior frontal brain regions for vocal emotional processing. A growing number of studies also suggest the involvement of the motor system, but little is known about such potential involvement. Using resting-state fMRI, we ask if the patterns of motor system intrinsic connectivity play a role in emotional prosody recognition in children. Fifty-five 8-year-old children completed an emotional prosody recognition task and a resting-state scan. Better performance in emotion recognition was predicted by a stronger connectivity between the inferior frontal gyrus (IFG) and motor regions including primary motor, lateral premotor and supplementary motor sites. This is mostly driven by the IFG pars triangularis and cannot be explained by differences in domain-general cognitive abilities. These findings indicate that individual differences in the engagement of sensorimotor systems, and in its coupling with inferior frontal regions, underpin variation in children’s emotional speech perception skills. They suggest that sensorimotor and higher-order evaluative processes interact to aid emotion recognition, and have implications for models of vocal emotional communication.info:eu-repo/semantics/acceptedVersio

    Facial emotion recognition using min-max similarity classifier

    Full text link
    Recognition of human emotions from the imaging templates is useful in a wide variety of human-computer interaction and intelligent systems applications. However, the automatic recognition of facial expressions using image template matching techniques suffer from the natural variability with facial features and recording conditions. In spite of the progress achieved in facial emotion recognition in recent years, the effective and computationally simple feature selection and classification technique for emotion recognition is still an open problem. In this paper, we propose an efficient and straightforward facial emotion recognition algorithm to reduce the problem of inter-class pixel mismatch during classification. The proposed method includes the application of pixel normalization to remove intensity offsets followed-up with a Min-Max metric in a nearest neighbor classifier that is capable of suppressing feature outliers. The results indicate an improvement of recognition performance from 92.85% to 98.57% for the proposed Min-Max classification method when tested on JAFFE database. The proposed emotion recognition technique outperforms the existing template matching methods

    Robust Methods for the Automatic Quantification and Prediction of Affect in Spoken Interactions

    Full text link
    Emotional expression plays a key role in interactions as it communicates the necessary context needed for understanding the behaviors and intentions of individuals. Therefore, a speech-based Artificial Intelligence (AI) system that can recognize and interpret emotional expression has many potential applications with measurable impact to a variety of areas, including human-computer interaction (HCI) and healthcare. However, there are several factors that make speech emotion recognition (SER) a difficult task; these factors include: variability in speech data, variability in emotion annotations, and data sparsity. This dissertation explores methodologies for improving the robustness of the automatic recognition of emotional expression from speech by addressing the impacts of these factors on various aspects of the SER system pipeline. For addressing speech data variability in SER, we propose modeling techniques that improve SER performance by leveraging short-term dynamical properties of speech. Furthermore, we demonstrate how data augmentation improves SER robustness to speaker variations. Lastly, we discover that we can make more accurate predictions of emotion by considering the fine-grained interactions between the acoustic and lexical components of speech. For addressing the variability in emotion annotations, we propose SER modeling techniques that account for the behaviors of annotators (i.e., annotators' reaction delay) to improve time-continuous SER robustness. For addressing data sparsity, we investigate two methods that enable us to learn robust embeddings, which highlight the differences that exist between neutral speech and emotionally expressive speech, without requiring emotion annotations. In the first method, we demonstrate how emotionally charged vocal expressions change speaker characteristics as captured by embeddings extracted from a speaker identification model, and we propose the use of these embeddings in SER applications. In the second method, we propose a framework for learning emotion embeddings using audio-textual data that is not annotated for emotion. The unification of the methods and results presented in this thesis helps enable the development of more robust SER systems, making key advancements toward an interactive speech-based AI system that is capable of recognizing and interpreting human behaviors.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/166106/1/aldeneh_1.pd
    • …
    corecore