491 research outputs found

    Audiovisual prosody in interaction

    Get PDF

    The Perception of Prosody in English-Speaking Children with Cochlear Implants: A Systematic Review

    Full text link
    Objective: The goal of this paper was to systematically review literature in order to investigate the perception of prosody in English-speaking children with cochlear implants. Methods: A comprehensive search utilizing various peer-reviewed databases accessible through the City University of New York (CUNY) Graduate Center Library was conducted to identify relevant studies. Inclusion criteria included studies that examined prosody perception in pre-and post-lingually deafened children with cochlear implants. Children who utilized unilateral, bilateral, and bimodal configurations of cochlear implants were therefore included in this search. Results: 9 studies met the inclusion criteria for this systematic review. The findings demonstrated both negative and positive outcomes for pediatric users of cochlear implants. Of the 9 studies included in this systematic review, 6 (66%) included an outcome measure that assessed emotion perception, and 3 (33%) included an outcome measure that examined specific domains of speech prosody perception. Additionally, 2 of the 9 (22%) included studies specifically investigated the connection between music and the perception of emotional speech prosody. Discussion: Results support the use and continued development of intensive (re)habilitation emphasizing suprasegmental and paralinguistic aspects of speech through prosody perception measures sensitive to both emotional and linguistic components. Positive effects of music training were also found in audio-only conditions for the perception of emotional prosody. Future research needs to be based on larger sample sizes, and should offer more alternative choices in the identification of emotional or prosodic cues, heightening prosody classification difficulty for prosody perception tasks. Incorporating differing levels of background noise and reverberation during prosody perception tasks is also recommended to simulate situations which are more representative of complex listening situations encountered by pediatric cochlear implant users. Conclusions: Performance on emotion recognition and other aspects of prosody perception including music perception is generally poorer in children with cochlear implants than in participants in comparison groups, such as normal-hearing children. Specifically, the findings of this systematic review support the use and validation of intensive (re)habilitation measures emphasizing suprasegmental and paralinguistic aspects of speech, as well as emotion and music prosody perception

    Feature extraction based on bio-inspired model for robust emotion recognition

    Get PDF
    Emotional state identification is an important issue to achieve more natural speech interactive systems. Ideally, these systems should also be able to work in real environments in which generally exist some kind of noise. Several bio-inspired representations have been applied to artificial systems for speech processing under noise conditions. In this work, an auditory signal representation is used to obtain a novel bio-inspired set of features for emotional speech signals. These characteristics, together with other spectral and prosodic features, are used for emotion recognition under noise conditions. Neural models were trained as classifiers and results were compared to the well-known mel-frequency cepstral coefficients. Results show that using the proposed representations, it is possible to significantly improve the robustness of an emotion recognition system. The results were also validated in a speaker independent scheme and with two emotional speech corpora.Fil: Albornoz, Enrique Marcelo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Milone, Diego Humberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Rufiner, Hugo Leonardo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentin

    Automatic Measurement of Affect in Dimensional and Continuous Spaces: Why, What, and How?

    Get PDF
    This paper aims to give a brief overview of the current state-of-the-art in automatic measurement of affect signals in dimensional and continuous spaces (a continuous scale from -1 to +1) by seeking answers to the following questions: i) why has the field shifted towards dimensional and continuous interpretations of affective displays recorded in real-world settings? ii) what are the affect dimensions used, and the affect signals measured? and iii) how has the current automatic measurement technology been developed, and how can we advance the field

    Bimodal Emotion Recognition using Speech and Physiological Changes

    Get PDF
    With exponentially evolving technology it is no exaggeration to say that any interface fo

    Recognition of Emotion from Speech: A Review

    Get PDF

    Non-acted multi-view audio-visual dyadic Interactions. Project master thesis: multi-modal local and recurrent non-verbal emotion recognition in dyadic scenarios

    Get PDF
    Treballs finals del Màster de Fonaments de Ciència de Dades, Facultat de matemàtiques, Universitat de Barcelona, Any: 2019, Tutor: Sergio Escalera Guerrero i Cristina Palmero[en] In particular, this master thesis is focused on the development of baseline emotion recognition system in a dyadic environment using raw and handcraft audio features and cropped faces from the videos. This system is analyzed at frame and utterance level with and without temporal information. For this reason, an exhaustive study of the state-of-the-art on emotion recognition techniques has been conducted, paying particular attention on Deep Learning techniques for emotion recognition. While studying the state-of-the-art from the theoretical point of view, a dataset consisting of videos of sessions of dyadic interactions between individuals in different scenarios has been recorded. Different attributes were captured and labelled from these videos: body pose, hand pose, emotion, age, gender, etc. Once the architectures for emotion recognition have been trained with other dataset, a proof of concept is done with this new database in order to extract conclusions. In addition, this database can help future systems to achieve better results. A large number of experiments with audio and video are performed to create the emotion recognition system. The IEMOCAP database is used to perform the training and evaluation experiments of the emotion recognition system. Once the audio and video are trained separately with two different architectures, a fusion of both methods is done. In this work, the importance of preprocessing data (i.e. face detection, windows analysis length, handcrafted features, etc.) and choosing the correct parameters for the architectures (i.e. network depth, fusion, etc.) has been demonstrated and studied, while some experiments to study the influence of the temporal information are performed using some recurrent models for the spatiotemporal utterance level recognition of emotion. Finally, the conclusions drawn throughout this work are exposed, as well as the possible lines of future work including new systems for emotion recognition and the experiments with the database recorded in this work

    Non-acted multi-view audio-visual dyadic interactions. Project non-verbal emotion recognition in dyadic scenarios and speaker segmentation

    Get PDF
    Treballs finals del Màster de Fonaments de Ciència de Dades, Facultat de matemàtiques, Universitat de Barcelona, Any: 2019, Tutor: Sergio Escalera Guerrero i Cristina Palmero[en] In particular, this Master Thesis is focused on the development of baseline Emotion Recognition System in a dyadic environment using raw and handcraft audio features and cropped faces from the videos. This system is analyzed at frame and utterance level without temporal information. As well, a baseline Speaker Segmenta- tion System has been developed to facilitate the annotation task. For this reason, an exhaustive study of the state-of-the-art on emotion recognition and speaker segmentation techniques has been conducted, paying particular attention on Deep Learning techniques for emotion recognition and clustering for speaker aegmentation. While studying the state-of-the-art from the theoretical point of view, a dataset consisting of videos of sessions of dyadic interactions between individuals in different scenarios has been recorded. Different attributes were captured and labelled from these videos: body pose, hand pose, emotion, age, gender, etc. Once the ar- chitectures for emotion recognition have been trained with other dataset, a proof of concept is done with this new database in order to extract conclusions. In addition, this database can help future systems to achieve better results. A large number of experiments with audio and video are performed to create the emotion recognition system. The IEMOCAP database is used to perform the training and evaluation experiments of the emotion recognition system. Once the audio and video are trained separately with two different architectures, a fusion of both methods is done. In this work, the importance of preprocessing data (face detection, windows analysis length, handcrafted features, etc.) and choosing the correct parameters for the architectures (network depth, fusion, etc.) has been demonstrated and studied. On the other hand, the experiments for the speaker segmentation system are performed with a piece of audio from IEMOCAP database. In this work, the prerprocessing steps, the problems of an unsupervised system such as clustering and the feature representation are studied and discussed. Finally, the conclusions drawn throughout this work are exposed, as well as the possible lines of future work including new systems for emotion recognition and the experiments with the database recorded in this work
    • …
    corecore