    Towards Emotion Recognition: A Persistent Entropy Application

    Emotion recognition and classification is a very active area of research. In this paper, we present a first approach to emotion classification using persistent entropy and support vector machines. A topology-based model is applied to obtain a single real number from each raw signal. These data are used as input of a support vector machine to classify signals into 8 different emotions (calm, happy, sad, angry, fearful, disgust and surprised)

    A combined cepstral distance method for emotional speech recognition

    Affective computing is not only the direction of reform in artificial intelligence but also exemplification of the advanced intelligent machines. Emotion is the biggest difference between human and machine. If the machine behaves with emotion, then the machine will be accepted by more people. Voice is the most natural and can be easily understood and accepted manner in daily communication. The recognition of emotional voice is an important field of artificial intelligence. However, in recognition of emotions, there often exists the phenomenon that two emotions are particularly vulnerable to confusion. This article presents a combined cepstral distance method in two-group multi-class emotion classification for emotional speech recognition. Cepstral distance combined with speech energy is well used as speech signal endpoint detection in speech recognition. In this work, the use of cepstral distance aims to measure the similarity between frames in emotional signals and in neutral signals. These features are input for directed acyclic graph support vector machine classification. Finally, a two-group classification strategy is adopted to solve confusion in multi-emotion recognition. In the experiments, Chinese mandarin emotion database is used and a large training set (1134 + 378 utterances) ensures a powerful modelling capability for predicting emotion. The experimental results show that cepstral distance increases the recognition rate of emotion sad and can balance the recognition results with eliminating the over fitting. And for the German corpus Berlin emotional speech database, the recognition rate between sad and boring, which are very difficult to distinguish, is up to 95.45%

    A system for recognizing human emotions based on speech analysis and facial feature extraction: applications to Human-Robot Interaction

    With the advance in Artificial Intelligence, humanoid robots start to interact with ordinary people based on the growing understanding of psychological processes. Accumulating evidences in Human Robot Interaction (HRI) suggest that researches are focusing on making an emotional communication between human and robot for creating a social perception, cognition, desired interaction and sensation. Furthermore, robots need to receive human emotion and optimize their behavior to help and interact with a human being in various environments. The most natural way to recognize basic emotions is extracting sets of features from human speech, facial expression and body gesture. A system for recognition of emotions based on speech analysis and facial features extraction can have interesting applications in Human-Robot Interaction. Thus, the Human-Robot Interaction ontology explains how the knowledge of these fundamental sciences is applied in physics (sound analyses), mathematics (face detection and perception), philosophy theory (behavior) and robotic science context. In this project, we carry out a study to recognize basic emotions (sadness, surprise, happiness, anger, fear and disgust). Also, we propose a methodology and a software program for classification of emotions based on speech analysis and facial features extraction. The speech analysis phase attempted to investigate the appropriateness of using acoustic (pitch value, pitch peak, pitch range, intensity and formant), phonetic (speech rate) properties of emotive speech with the freeware program PRAAT, and consists of generating and analyzing a graph of speech signals. The proposed architecture investigated the appropriateness of analyzing emotive speech with the minimal use of signal processing algorithms. 30 participants to the experiment had to repeat five sentences in English (with durations typically between 0.40 s and 2.5 s) in order to extract data relative to pitch (value, range and peak) and rising-falling intonation. Pitch alignments (peak, value and range) have been evaluated and the results have been compared with intensity and speech rate. The facial feature extraction phase uses the mathematical formulation (B\ue9zier curves) and the geometric analysis of the facial image, based on measurements of a set of Action Units (AUs) for classifying the emotion. The proposed technique consists of three steps: (i) detecting the facial region within the image, (ii) extracting and classifying the facial features, (iii) recognizing the emotion. Then, the new data have been merged with reference data in order to recognize the basic emotion. Finally, we combined the two proposed algorithms (speech analysis and facial expression), in order to design a hybrid technique for emotion recognition. Such technique have been implemented in a software program, which can be employed in Human-Robot Interaction. The efficiency of the methodology was evaluated by experimental tests on 30 individuals (15 female and 15 male, 20 to 48 years old) form different ethnic groups, namely: (i) Ten adult European, (ii) Ten Asian (Middle East) adult and (iii) Ten adult American. Eventually, the proposed technique made possible to recognize the basic emotion in most of the cases

    Recognizing the Emotional State Changes in Human Utterance by a Learning Statistical Method based on Gaussian Mixture Model

    Speech is one of the most opulent and instant methods to express emotional characteristics of human beings, which conveys the cognitive and semantic concepts among humans. In this study, a statistical-based method for emotional recognition of speech signals is proposed, and a learning approach is introduced, which is based on the statistical model to classify internal feelings of the utterance. This approach analyzes and tracks the emotional state changes trend of speaker during the speech. The proposed method classifies utterance emotions in six standard classes including, boredom, fear, anger, neutral, disgust and sadness. For this purpose, it is applied the renowned speech corpus database, EmoDB, for training phase of the proposed approach. In this process, once the pre-processing tasks are done, the meaningful speech patterns and attributes are extracted by MFCC method, and meticulously selected by SFS method. Then, a statistical classification approach is called and altered to employ as a part of the method. This approach is entitled as the LGMM, which is used to categorize obtained features. Aftermath, with the help of the classification results, it is illustrated the emotional states changes trend to reveal speaker feelings. The proposed model also has been compared with some recent models of emotional speech classification, in which have been used similar methods and materials. Experimental results show an admissible overall recognition rate and stability in classifying the uttered speech in six emotional states, and also the proposed algorithm outperforms the other similar models in classification accuracy rates

    Acoustic features of voice in adults suffering from depression

    In order to examine the differences in people suffering from depression (EG, N=18) compared to the healthy controls (CG1, N=24) and people with the diagnosed psychogenic voice disorder (CG2, N=9), nine acoustic features of voice were assessed among the total of 51 participants using the MDVP software programme (“Kay Elemetrics” Corp., model 4300). Nine acoustic parameters were analysed on the basis of the sustained phonation of the vowel /a/. The results revealed that the mean values of all acoustic parameters differed in the EG compared to both the CG1 and CG2 as follows: the parameters which indicate frequency variability (Jitt, PPQ), amplitude variability (Shim, vAm, APQ) and noise and tremor parameters (NHR, VTI) were higher; only the parameters of fundamental frequency (F0) and soft index phonation (SPI) were lower (F0 compared to CG1, and SPI compared to CG1 and CG2). Only the PPQ parameter was not significant. vAm and APQ had the highest discriminant value for depression. The acoustic features of voice, analysed in this study with regard to the sustained phonation of a vowel, were different and discriminant in the EG compared to CG1 and CG2. In voice analysis, the parameters vAm and APQ could potentially be the markers indicative of depression. The results of this research point to the importance of the voice, that is, its acoustic indicators, in recognizing depression. Important parameters that could help create a programme for the automatic recognition of depression are those from the domain of voice intensity variation.U cilju utvrđivanja razlika između grupe osoba sa depresivnim poremećajem (EG, N=18) u odnosu na grupu osoba iz tipične populacije (CG1, N=24) i grupu osoba sa dijagnostikovanim psihogenim poremećajem glasa (CG2, N=9) analizirano je 9 akustičkih karakteristika glasa primenom MDVP softverskog programa (“Kay Elemetrics” Corp., model 4300) na uzorku od 51 ispitanika. Devet akustičkih parametara analizirano je na osnovu produženog foniranja vokala /a/. Rezultati istraživanja pokazuju da se srednje vrednosti svih akustičkih parametara razlikuju između osoba sa depresivnim poremećajem u odnosu na obe kontrolne grupe i to: parametri varijabilnosti frekvencije (Jitter, PPQ), varijabilnosti amplitude (Shimmer, vAm i APQ), i parametri procene šuma i tremora (NHR i VTI) imaju više vrednosti; samo su parametar fundamentalne frekvencije (F0) i indeks prigušene fonacije (SPI) niži (F0 u odnosu na CG1, i SPI u odnosu na CG2). Samo se parametar PPQ nije pokazao značajnim. Parametri vAm i APQ imaju najveću diskriminativnu vrednost za depresivni poremećaj. Akustičke karakteristike glasa analizirane na osnovu produženog foniranja vokala u ovom istraživanju razlikuju i diskriminišu EG i u odnosu na CG1 i u odnosu na CG2. U vokalnoj analizi parametri vAm i APQ bi potencijalno mogli biti markeri koji ukazuju na depresivni poremećaj. Rezultati ovog istraživanja ukazuju na značaj glasa, odnosno njegovih akustičkih pokazatelja, u prepoznavanju depresije. Važni parametri koji bi mogli da pomognu u kreiranju programa za automatsko prepoznavanje depresije su oni iz domena varijacije intenziteta glasa

    Towards a Technology of Nonverbal Communication: Vocal Behavior in Social and Affective Phenomena

    Nonverbal communication is the main channel through which we experience inner life of others, including their emotions, feelings, moods, social attitudes, etc. This attracts the interest of the computing community because nonverbal communication is based on cues like facial expressions, vocalizations, gestures, postures, etc. that we can perceive with our senses and can be (and often are) detected, analyzed and synthesized with automatic approaches. In other words, nonverbal communication can be used as a viable interface between computers and some of the most important aspects of human psychology such as emotions and social attitudes. As a result, a new computing domain seems to emerge that we can define “technology of nonverbal communicationâ€. This chapter outlines some of the most salient aspects of such a potentially new domain and outlines some of its most important perspectives for the future

    Emotion Recognition from Speech Signals and Perception of Music

    This thesis deals with emotion recognition from speech signals. The feature extraction step shall be improved by looking at the perception of music. In music theory, different pitch intervals (consonant, dissonant) and chords are believed to invoke different feelings in listeners. The question is whether there is a similar mechanism between perception of music and perception of emotional speech. Our research will follow three stages. First, the relationship between speech and music at segmental and supra-segmental levels will be analyzed. Secondly, the encoding of emotions through music shall be investigated. In the third stage, a description of the most common features used for emotion recognition from speech will be provided. We will additionally derive new high-level musical features, which will lead us to an improvement of the recognition rate for the basic spoken emotions

    Stress recognition from speech signal

    Předložená disertační práce se zabývá vývojem algoritmů pro detekci stresu z řečového signálu. Inovativnost této práce se vyznačuje dvěma typy analýzy řečového signálu, a to za použití samohláskových polygonů a analýzy hlasivkových pulsů. Obě tyto základní analýzy mohou sloužit k detekci stresu v řečovém signálu, což bylo dokázáno sérií provedených experimentů. Nejlepších výsledků bylo dosaženo pomocí tzv. Closing-To-Opening phase ratio příznaku v Top-To-Bottom kritériu v kombinaci s vhodným klasifikátorem. Detekce stresu založená na této analýze může být definována jako jazykově i fonémově nezávislá, což bylo rovněž dokázáno získanými výsledky, které dosahují v některých případech až 95% úspěšnosti. Všechny experimenty byly provedeny na vytvořené české databázi obsahující reálný stres, a některé experimenty byly také provedeny pro anglickou stresovou databázi SUSAS.Presented doctoral thesis is focused on development of algorithms for psychological stress detection in speech signal. The novelty of this thesis aims on two different analysis of the speech signal- the analysis of vowel polygons and the analysis of glottal pulses. By performed experiments, the doctoral thesis uncovers the possible usage of both fundamental analyses for psychological stress detection in speech. The analysis of glottal pulses in amplitude domain according to Top-To-Bottom criterion seems to be as the most effective with the combination of properly chosen classifier, which can be defined as language and phoneme independent way to stress recognition. All experiments were performed on developed Czech real stress database and some observations were also made on English database SUSAS. The variety of possibly effective ways of stress recognition in speech leads to approach very high recognition accuracy of their combination, or of their possible usage for detection of other speaker’s state, which has to be further tested and verified by appropriate databases.