7,863 research outputs found

    DEMoS: an Italian emotional speech corpus - elicitation methods, machine learning, and perception

    Get PDF
    DEMoS (Database of Elicited Mood in Speech), is a corpus of induced emotional speech in Italian. DEMoS encompasses 9,365 emotional and 332 neutral samples produced by 68 native speakers (23 females, 45 males) in seven emotional states: the ‘big six’ anger, sadness, happiness, fear, surprise, disgust, and the secondary emotion guilt. To get more realistic productions, instead of acted speech, DEMoS contains emotional speech elicited by combinations of Mood Induction Procedures (MIP). Three elicitation methods are presented, made up by the combination of at least three MIPs, and considering six different MIPs in total. To select samples ‘typical’ of each emotion, evaluation strategies based on self- and external assessment were applied. The selected part of the corpus encompasses 1,564 prototypical samples produced by 59 speakers (21 females, 38 male). DEMoS has been published in the Journal Language, Resousrces, and Evalaution. Emilia Parada-Cabaleiro, Giovanni Costantini, Anton Batliner, Maximilian Schmitt, and Björn Schuller (2019), DEMoS: An Italian emotional speech corpus. Elicitation methods, machine learning, and perception, Language, Resources, and Evaluation, Feb 2019. https://rdcu.be/bn7o

    Identifying emotions in opera singing: implications of adverse acoustic conditions

    Get PDF
    The expression of emotion is an inherent aspect in singing, especially in operatic voice. Yet, adverse acoustic conditions, as, e. g., a performance in open-air, or a noisy analog recording, may affect its perception. State-of-the art methods for emotional speech evaluation have been applied to operatic voice, such as perception experiments, acoustic analyses, and machine learning techniques. Still, the extent to which adverse acoustic conditions may impair listeners’ and machines’ identification of emotion in vocal cues has only been investigated in the realm of speech. For our study, 132 listeners evaluated 390 nonsense operatic sung instances of five basic emotions, affected by three noises (brown, pink, and white), each at four Signal-to-Noise Ratios (-1 dB, -0.5 dB, +1 dB, and +3 dB); the performance of state-of-the-art automatic recognition methods was evaluated as well. Our findings show that the three noises affect similarly female and male singers and that listeners’ gender did not play a role. Human perception and automatic classification display similar confusion and recognition patterns: sadness is identified best, fear worst; low aroused emotions display higher confusion

    The Emotion Probe: On the Universality of Cross-Linguistic and Cross-Gender Speech Emotion Recognition via Machine Learning

    Get PDF
    Machine Learning (ML) algorithms within a human–computer framework are the leading force in speech emotion recognition (SER). However, few studies explore cross-corpora aspects of SER; this work aims to explore the feasibility and characteristics of a cross-linguistic, cross-gender SER. Three ML classifiers (SVM, Naïve Bayes and MLP) are applied to acoustic features, obtained through a procedure based on Kononenko’s discretization and correlation-based feature selection. The system encompasses five emotions (disgust, fear, happiness, anger and sadness), using the Emofilm database, comprised of short clips of English movies and the respective Italian and Spanish dubbed versions, for a total of 1115 annotated utterances. The results see MLP as the most effective classifier, with accuracies higher than 90% for single-language approaches, while the cross-language classifier still yields accuracies higher than 80%. The results show cross-gender tasks to be more difficult than those involving two languages, suggesting greater differences between emotions expressed by male versus female subjects than between different languages. Four feature domains, namely, RASTA, F0, MFCC and spectral energy, are algorithmically assessed as the most effective, refining existing literature and approaches based on standard sets. To our knowledge, this is one of the first studies encompassing cross-gender and cross-linguistic assessments on SER

    Can Spontaneous Emotions be Detected from Speech on TV Political Debates?

    Get PDF
    Accepted paperDecoding emotional states from multimodal signals is an increasingly active domain, within the framework of affective computing, which aims to a better understanding of Human-Human Communication as well as to improve Human- Computer Interaction. But the automatic recognition of sponta- neous emotions from speech is a very complex task due to the lack of a certainty of the speaker states as well as to the difficulty to identify a variety of emotions in real scenarios. In this work we explore the extent to which emotional states can be decoded from speech signals extracted from TV political debates. The labelling procedure was supported by perception experiments where only a small set of emotions has been identified. In addition, some scaled judgements of valence, arousal and dominance were also provided. In this framework the paper shows meaningful comparisons between both, the dimensional and the categorical models of emotions, which is a new con- tribution when dealing with spontaneous emotions. To this end Support Vector Machines (SVM) as well as Feedforward Neural Networks (FNN) have been proposed to develop classifiers and predictors. The experimental evaluation over a Spanish corpus has shown the ability of both models to be identified in speech segments by the proposed artificial systems.This work has been partially funded by the Spanish Government under grant TIN2017-85854-C4-3-R (AEI/FEDER,UE) and conducted in the project EMPATHIC (Grant n769872) funded by the European Union’s H2020 research andinnovation program

    Analysis and automatic identification of spontaneous emotions in speech from human-human and human-machine communication

    Get PDF
    383 p.This research mainly focuses on improving our understanding of human-human and human-machineinteractions by analysing paricipants¿ emotional status. For this purpose, we have developed andenhanced Speech Emotion Recognition (SER) systems for both interactions in real-life scenarios,explicitly emphasising the Spanish language. In this framework, we have conducted an in-depth analysisof how humans express emotions using speech when communicating with other persons or machines inactual situations. Thus, we have analysed and studied the way in which emotional information isexpressed in a variety of true-to-life environments, which is a crucial aspect for the development of SERsystems. This study aimed to comprehensively understand the challenge we wanted to address:identifying emotional information on speech using machine learning technologies. Neural networks havebeen demonstrated to be adequate tools for identifying events in speech and language. Most of themaimed to make local comparisons between some specific aspects; thus, the experimental conditions weretailored to each particular analysis. The experiments across different articles (from P1 to P19) are hardlycomparable due to our continuous learning of dealing with the difficult task of identifying emotions inspeech. In order to make a fair comparison, additional unpublished results are presented in the Appendix.These experiments were carried out under identical and rigorous conditions. This general comparisonoffers an overview of the advantages and disadvantages of the different methodologies for the automaticrecognition of emotions in speech

    Perception and classification of emotions in nonsense speech: humans versus machines

    Get PDF
    This article contributes to a more adequate modelling of emotions encoded in speech, by addressing four fallacies prevalent in traditional affective computing: First, studies concentrate on few emotions and disregard all other ones (‘closed world’). Second, studies use clean (lab) data or real-life ones but do not compare clean and noisy data in a comparable setting (‘clean world’). Third, machine learning approaches need large amounts of data; however, their performance has not yet been assessed by systematically comparing different approaches and different sizes of databases (‘small world’). Fourth, although human annotations of emotion constitute the basis for automatic classification, human perception and machine classification have not yet been compared on a strict basis (‘one world’). Finally, we deal with the intrinsic ambiguities of emotions by interpreting the confusions between categories (‘fuzzy world’). We use acted nonsense speech from the GEMEP corpus, emotional ‘distractors’ as categories not entailed in the test set, real-life noises that mask the clear recordings, and different sizes of the training set for machine learning. We show that machine learning based on state-of-the-art feature representations (wav2vec2) is able to mirror the main emotional categories (‘pillars’) present in perceptual emotional constellations even in degradated acoustic conditions

    Gender-related differences in the prevalence of voice disorders and awareness of dysphonia

    Get PDF
    Objective: Considering the impact of dysphonia on public health and the increasing attention to patient-centred care, we evaluated sex-related differences in the prevalence of benign voice disorders, awareness of dysphonia and voice therapy (VT) results. Methods: One hundred and seventy-one patients, 129 females and 42 males, with functional or organic benign dysphonia underwent Voice Handicap Index (VHI), auditory-perceptual dysphonia severity scoring (GRBAS) and acoustic analysis (Jitter%, Shimmer%, NHR) before and after VT. Results: Prevalence of each voice disorder was significantly higher among females. Mean time-to-diagnosis (time elapsed until medical consultation) was not different between males and females. The refusal of therapy and VT adherence (mean number of absences and premature dropout) were similar in the two groups. Pre-VT VHI and "G" parameter were worse in women. The percentage of women with abnormal acoustic analysis was significantly higher. Post-VT VHI gain was higher in women, whereas "G" parameter improvement did not differ by sex. Conclusions: Our study showed a higher prevalence of voice disorders in females. Awareness of dysphonia was not gender related. Females started with worse voice subjective perception and acoustic analysis, but they perceived greater improvement after therapy

    Dissociation and interpersonal autonomic physiology in psychotherapy research: an integrative view encompassing psychodynamic and neuroscience theoretical frameworks

    Get PDF
    Interpersonal autonomic physiology is an interdisciplinary research field, assessing the relational interdependence of two (or more) interacting individual both at the behavioral and psychophysiological levels. Despite its quite long tradition, only eight studies since 1955 have focused on the interaction of psychotherapy dyads, and none of them have focused on the shared processual level, assessing dynamic phenomena such as dissociation. We longitudinally observed two brief psychodynamic psychotherapies, entirely audio and video-recorded (16 sessions, weekly frequency, 45 min.). Autonomic nervous system measures were continuously collected during each session. Personality, empathy, dissociative features and clinical progress measures were collected prior and post therapy, and after each clinical session. Two-independent judges, trained psychotherapist, codified the interactions\u2019 micro-processes. Time-series based analyses were performed to assess interpersonal synchronization and de-synchronization in patient\u2019s and therapist\u2019s physiological activity. Psychophysiological synchrony revealed a clear association with empathic attunement, while desynchronization phases (range of length 30-150 sec.) showed a linkage with dissociative processes, usually associated to the patient\u2019s narrative core relational trauma. Our findings are discussed under the perspective of psychodynamic models of Stern (\u201cpresent moment\u201d), Sander, Beebe and Lachmann (dyad system model of interaction), Lanius (Trauma model), and the neuroscientific frameworks proposed by Thayer (neurovisceral integration model), and Porges (polyvagal theory). The collected data allows to attempt an integration of these theoretical approaches under the light of Complex Dynamic Systems. The rich theoretical work and the encouraging clinical results might represents a new fascinating frontier of research in psychotherapy
    • …
    corecore