2,782 research outputs found
Feature extraction based on bio-inspired model for robust emotion recognition
Emotional state identification is an important issue to achieve more natural speech interactive systems. Ideally, these systems should also be able to work in real environments in which generally exist some kind of noise. Several bio-inspired representations have been applied to artificial systems for speech processing under noise conditions. In this work, an auditory signal representation is used to obtain a novel bio-inspired set of features for emotional speech signals. These characteristics, together with other spectral and prosodic features, are used for emotion recognition under noise conditions. Neural models were trained as classifiers and results were compared to the well-known mel-frequency cepstral coefficients. Results show that using the proposed representations, it is possible to significantly improve the robustness of an emotion recognition system. The results were also validated in a speaker independent scheme and with two emotional speech corpora.Fil: Albornoz, Enrique Marcelo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Milone, Diego Humberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Rufiner, Hugo Leonardo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentin
Comprehensive Study of Automatic Speech Emotion Recognition Systems
Speech emotion recognition (SER) is the technology that recognizes psychological characteristics and feelings from the speech signals through techniques and methodologies. SER is challenging because of more considerable variations in different languages arousal and valence levels. Various technical developments in artificial intelligence and signal processing methods have encouraged and made it possible to interpret emotions.SER plays a vital role in remote communication. This paper offers a recent survey of SER using machine learning (ML) and deep learning (DL)-based techniques. It focuses on the various feature representation and classification techniques used for SER. Further, it describes details about databases and evaluation metrics used for speech emotion recognition
An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era
Speech is the fundamental mode of human communication, and its synthesis has
long been a core priority in human-computer interaction research. In recent
years, machines have managed to master the art of generating speech that is
understandable by humans. But the linguistic content of an utterance
encompasses only a part of its meaning. Affect, or expressivity, has the
capacity to turn speech into a medium capable of conveying intimate thoughts,
feelings, and emotions -- aspects that are essential for engaging and
naturalistic interpersonal communication. While the goal of imparting
expressivity to synthesised utterances has so far remained elusive, following
recent advances in text-to-speech synthesis, a paradigm shift is well under way
in the fields of affective speech synthesis and conversion as well. Deep
learning, as the technology which underlies most of the recent advances in
artificial intelligence, is spearheading these efforts. In the present
overview, we outline ongoing trends and summarise state-of-the-art approaches
in an attempt to provide a comprehensive overview of this exciting field.Comment: Submitted to the Proceedings of IEE
Experimental Approaches to Socio‐Linguistics: Usage and Interpretation of Non‐Verbal and Verbal Expressions in Cross‐ Cultural Communication
Social context shapes our behavior in interpersonal communication. In this chapter, I will address how experimental psychology contributes to the study of socio-linguistic processes, focusing on nonverbal and verbal processing in a cross-cultural or cross-linguistic communicative setting. A systematic review of the most up-to-date empirical studies will show: 1) the culturally-universal and culturally-specific encoding of emotion in speech. The acoustic cues that are commonly involved in discriminating basic emotions in vocal expressions across languages and the cross-linguistic variations in such encoding will be demonstrated; 2) the modulation of in-group and out-group status (e.g. inferred from speaker’s dialect, familiarity towards a language) on the encoding and decoding of speaker’s meaning; 3) the impact of cultural orientation and cultural learning on the interpretation of social and affective meaning, focusing on how immigration process shapes one’s language use and comprehension. I will highlight the significance of combining the research paradigms from experimental psychology with cognitive (neuro)science methodologies such as electrophysiological recording and functional magnetic resonance imaging, to address the relevant questions in cross-cultural communicative settings. The chapter is concluded by a future direction to study the socio-cultural bases of language and linguistic underpinnings of cultural behaviour
Recommended from our members
Computational Approaches to Modeling Speaker State in the Medical Domain
Recently, researchers in computer science and engineering have begun to explore the possibility of finding speech-based correlates of various medical conditions using automatic, computational methods. If such language cues can be identified and quantified automatically, this information can be used to support diagnosis and treatment of medical conditions in clinical settings and to further fundamental research in understanding cognition. This chapter reviews computational approaches that explore communicative patterns of patients who suffer from medical conditions such as depression, autism spectrum disorders, schizophrenia, and cancer. There are two main approaches discussed: research that explores features extracted from the acoustic signal and research that focuses on lexical and semantic features. We also present some applied research that uses computational methods to develop assistive technologies. In the final sections we discuss issues related to and the future of this emerging field of research
Analyzing Prosody with Legendre Polynomial Coefficients
This investigation demonstrates the effectiveness of Legendre polynomial coefficients representing prosodic contours within the context of two different tasks: nativeness classification and sarcasm detection. By making use of accurate representations of prosodic contours to answer fundamental linguistic questions, we contribute significantly to the body of research focused on analyzing prosody in linguistics as well as modeling prosody for machine learning tasks. Using Legendre polynomial coefficient representations of prosodic contours, we answer prosodic questions about differences in prosody between native English speakers and non-native English speakers whose first language is Mandarin. We also learn more about prosodic qualities of sarcastic speech. We additionally perform machine learning classification for both tasks, (achieving an accuracy of 72.3% for nativeness classification, and achieving 81.57% for sarcasm detection). We recommend that linguists looking to analyze prosodic contours make use of Legendre polynomial coefficients modeling; the accuracy and quality of the resulting prosodic contour representations makes them highly interpretable for linguistic analysis
- …