14 research outputs found
Towards Emotion Recognition: A Persistent Entropy Application
Emotion recognition and classification is a very active area of research. In this paper, we present
a first approach to emotion classification using persistent entropy and support vector machines. A
topology-based model is applied to obtain a single real number from each raw signal. These data are
used as input of a support vector machine to classify signals into 8 different emotions (calm, happy,
sad, angry, fearful, disgust and surprised)
Towards Emotion Recognition: A Persistent Entropy Application
Emotion recognition and classification is a very active area of research. In
this paper, we present a first approach to emotion classification using
persistent entropy and support vector machines. A topology-based model is
applied to obtain a single real number from each raw signal. These data are
used as input of a support vector machine to classify signals into 8 different
emotions (calm, happy, sad, angry, fearful, disgust and surprised)
A combined cepstral distance method for emotional speech recognition
Affective computing is not only the direction of reform in artificial intelligence but also exemplification of the advanced intelligent machines. Emotion is the biggest difference between human and machine. If the machine behaves with emotion, then the machine will be accepted by more people. Voice is the most natural and can be easily understood and accepted manner in daily communication. The recognition of emotional voice is an important field of artificial intelligence. However, in recognition of emotions, there often exists the phenomenon that two emotions are particularly vulnerable to confusion. This article presents a combined cepstral distance method in two-group multi-class emotion classification for emotional speech recognition. Cepstral distance combined with speech energy is well used as speech signal endpoint detection in speech recognition. In this work, the use of cepstral distance aims to measure the similarity between frames in emotional signals and in neutral signals. These features are input for directed acyclic graph support vector machine classification. Finally, a two-group classification strategy is adopted to solve confusion in multi-emotion recognition. In the experiments, Chinese mandarin emotion database is used and a large training set (1134 + 378 utterances) ensures a powerful modelling capability for predicting emotion. The experimental results show that cepstral distance increases the recognition rate of emotion sad and can balance the recognition results with eliminating the over fitting. And for the German corpus Berlin emotional speech database, the recognition rate between sad and boring, which are very difficult to distinguish, is up to 95.45%
A system for recognizing human emotions based on speech analysis and facial feature extraction: applications to Human-Robot Interaction
With the advance in Artificial Intelligence, humanoid robots start to interact with ordinary people based on the growing understanding of psychological processes. Accumulating evidences in Human Robot Interaction (HRI) suggest that researches are focusing on making an emotional communication between human and robot for creating a social perception, cognition, desired interaction and sensation.
Furthermore, robots need to receive human emotion and optimize their behavior to help and interact with a human being in various environments. The most natural way to recognize basic emotions is extracting sets of features from human speech, facial expression and body gesture. A system for recognition of emotions based on speech analysis and facial features extraction can have interesting applications in Human-Robot Interaction. Thus, the Human-Robot Interaction ontology explains how the knowledge of these fundamental sciences is applied in physics (sound analyses), mathematics (face detection and perception), philosophy theory (behavior) and robotic science context.
In this project, we carry out a study to recognize basic emotions (sadness, surprise, happiness, anger, fear and disgust). Also, we propose a methodology and a software program for classification of emotions based on speech analysis and facial features extraction.
The speech analysis phase attempted to investigate the appropriateness of using acoustic (pitch value, pitch peak, pitch range, intensity and formant), phonetic (speech rate) properties of emotive speech with the freeware program PRAAT, and consists of generating and analyzing a graph of speech signals. The proposed architecture investigated the appropriateness of analyzing emotive speech with the minimal use of signal processing algorithms. 30 participants to the experiment had to repeat five sentences in English (with durations typically between 0.40 s and 2.5 s) in order to extract data relative to pitch (value, range and peak) and rising-falling intonation. Pitch alignments (peak, value and range) have been evaluated and the results have been compared with intensity and speech rate.
The facial feature extraction phase uses the mathematical formulation (B\ue9zier curves) and the geometric analysis of the facial image, based on measurements of a set of Action Units (AUs) for classifying the emotion. The proposed technique consists of three steps: (i) detecting the facial region within the image, (ii) extracting and classifying the facial features, (iii) recognizing the emotion. Then, the new data have been merged with reference data in order to recognize the basic emotion.
Finally, we combined the two proposed algorithms (speech analysis and facial expression), in order to design a hybrid technique for emotion recognition. Such technique have been implemented in a software program, which can be employed in Human-Robot Interaction.
The efficiency of the methodology was evaluated by experimental tests on 30 individuals (15 female and 15 male, 20 to 48 years old) form different ethnic groups, namely: (i) Ten adult European, (ii) Ten Asian (Middle East) adult and (iii) Ten adult American.
Eventually, the proposed technique made possible to recognize the basic emotion in most of the cases
Recognizing the Emotional State Changes in Human Utterance by a Learning Statistical Method based on Gaussian Mixture Model
Speech is one of the most opulent and instant methods to express emotional characteristics of human beings, which conveys the cognitive and semantic concepts among humans. In this study, a statistical-based method for emotional recognition of speech signals is proposed, and a learning approach is introduced, which is based on the statistical model to classify internal feelings of the utterance. This approach analyzes and tracks the emotional state changes trend of speaker during the speech. The proposed method classifies utterance emotions in six standard classes including, boredom, fear, anger, neutral, disgust and sadness. For this purpose, it is applied the renowned speech corpus database, EmoDB, for training phase of the proposed approach. In this process, once the pre-processing tasks are done, the meaningful speech patterns and attributes are extracted by MFCC method, and meticulously selected by SFS method. Then, a statistical classification approach is called and altered to employ as a part of the method. This approach is entitled as the LGMM, which is used to categorize obtained features. Aftermath, with the help of the classification results, it is illustrated the emotional states changes trend to reveal speaker feelings. The proposed model also has been compared with some recent models of emotional speech classification, in which have been used similar methods and materials. Experimental results show an admissible overall recognition rate and stability in classifying the uttered speech in six emotional states, and also the proposed algorithm outperforms the other similar models in classification accuracy rates
Acoustic features of voice in adults suffering from depression
In order to examine the differences in people suffering from depression (EG, N=18)
compared to the healthy controls (CG1, N=24) and people with the diagnosed
psychogenic voice disorder (CG2, N=9), nine acoustic features of voice were
assessed among the total of 51 participants using the MDVP software programme
(“Kay Elemetrics” Corp., model 4300). Nine acoustic parameters were analysed on
the basis of the sustained phonation of the vowel /a/. The results revealed that the
mean values of all acoustic parameters differed in the EG compared to both the
CG1 and CG2 as follows: the parameters which indicate frequency variability (Jitt,
PPQ), amplitude variability (Shim, vAm, APQ) and noise and tremor parameters
(NHR, VTI) were higher; only the parameters of fundamental frequency (F0) and
soft index phonation (SPI) were lower (F0 compared to CG1, and SPI compared to
CG1 and CG2). Only the PPQ parameter was not significant. vAm and APQ had the
highest discriminant value for depression. The acoustic features of voice, analysed
in this study with regard to the sustained phonation of a vowel, were different
and discriminant in the EG compared to CG1 and CG2. In voice analysis, the parameters vAm and APQ could potentially be the markers indicative of depression.
The results of this research point to the importance of the voice, that is, its acoustic
indicators, in recognizing depression. Important parameters that could help create a
programme for the automatic recognition of depression are those from the domain
of voice intensity variation.U cilju utvrđivanja razlika između grupe osoba sa depresivnim poremećajem (EG,
N=18) u odnosu na grupu osoba iz tipične populacije (CG1, N=24) i grupu osoba
sa dijagnostikovanim psihogenim poremećajem glasa (CG2, N=9) analizirano je
9 akustičkih karakteristika glasa primenom MDVP softverskog programa (“Kay
Elemetrics” Corp., model 4300) na uzorku od 51 ispitanika. Devet akustičkih
parametara analizirano je na osnovu produženog foniranja vokala /a/. Rezultati
istraživanja pokazuju da se srednje vrednosti svih akustičkih parametara razlikuju između osoba sa depresivnim poremećajem u odnosu na obe kontrolne grupe i to: parametri varijabilnosti frekvencije (Jitter, PPQ), varijabilnosti amplitude
(Shimmer, vAm i APQ), i parametri procene šuma i tremora (NHR i VTI) imaju
više vrednosti; samo su parametar fundamentalne frekvencije (F0) i indeks prigušene fonacije (SPI) niži (F0 u odnosu na CG1, i SPI u odnosu na CG2). Samo
se parametar PPQ nije pokazao značajnim. Parametri vAm i APQ imaju najveću diskriminativnu vrednost za depresivni poremećaj. Akustičke karakteristike glasa
analizirane na osnovu produženog foniranja vokala u ovom istraživanju razlikuju
i diskriminišu EG i u odnosu na CG1 i u odnosu na CG2. U vokalnoj analizi parametri vAm i APQ bi potencijalno mogli biti markeri koji ukazuju na depresivni
poremećaj. Rezultati ovog istraživanja ukazuju na značaj glasa, odnosno njegovih
akustičkih pokazatelja, u prepoznavanju depresije. Važni parametri koji bi mogli
da pomognu u kreiranju programa za automatsko prepoznavanje depresije su oni
iz domena varijacije intenziteta glasa
Towards a Technology of Nonverbal Communication: Vocal Behavior in Social and Affective Phenomena
Nonverbal communication is the main channel through which we experience inner life of others, including their emotions, feelings, moods, social attitudes, etc. This attracts the interest of the computing community because nonverbal communication is based on cues like facial expressions, vocalizations, gestures, postures, etc. that we can perceive with our senses and can be (and often are) detected, analyzed and synthesized with automatic approaches. In other words, nonverbal communication can be used as a viable interface between computers and some of the most important aspects of human psychology such as emotions and social attitudes. As a result, a new computing domain seems to emerge that we can define “technology of nonverbal communicationâ€. This chapter outlines some of the most salient aspects of such a potentially new domain and outlines some of its most important perspectives for the future
Towards a Technology of Nonverbal Communication
Nonverbal communication is the main channel through which we experience inner life of others, including their emotions, feelings, moods, social attitudes, etc. This attracts the interest of the computing community because nonverbal communication is based on cues like facial expressions, vocalizations, gestures, postures, etc. that we can perceive with our senses and can be (and often are) detected, analyzed and synthesized with automatic approaches. In other words, nonverbal communication can be used as a viable interface between computers and some of the most important aspects of human psychology such as emotions and social attitudes. As a result, a new computing domain seems to emerge that we can define “technology of nonverbal communication”. This chapter outlines some of the most salient aspects of such a potentially new domain and outlines some of its most important perspectives for the future
Emotion Recognition from Speech Signals and Perception of Music
This thesis deals with emotion recognition from speech signals. The feature extraction step shall be improved by looking at the perception of music. In music theory, different pitch intervals (consonant, dissonant) and chords are believed to invoke different feelings in listeners. The question is whether there is a similar mechanism between perception of music and perception of emotional speech. Our research will follow three stages. First, the relationship between speech and music at segmental and supra-segmental levels will be analyzed. Secondly, the encoding of emotions through music shall be investigated. In the third stage, a description of the most common features used for emotion recognition from speech will be provided. We will additionally derive new high-level musical features, which will lead us to an improvement of the recognition rate for the basic spoken emotions
Stress recognition from speech signal
Předložená disertační práce se zabývá vývojem algoritmů pro detekci stresu z řečového signálu. Inovativnost této práce se vyznačuje dvěma typy analýzy řečového signálu, a to za použití samohláskových polygonů a analýzy hlasivkových pulsů. Obě tyto základní analýzy mohou sloužit k detekci stresu v řečovém signálu, což bylo dokázáno sérií provedených experimentů. Nejlepších výsledků bylo dosaženo pomocí tzv. Closing-To-Opening phase ratio příznaku v Top-To-Bottom kritériu v kombinaci s vhodným klasifikátorem. Detekce stresu založená na této analýze může být definována jako jazykově i fonémově nezávislá, což bylo rovněž dokázáno získanými výsledky, které dosahují v některých případech až 95% úspěšnosti. Všechny experimenty byly provedeny na vytvořené české databázi obsahující reálný stres, a některé experimenty byly také provedeny pro anglickou stresovou databázi SUSAS.Presented doctoral thesis is focused on development of algorithms for psychological stress detection in speech signal. The novelty of this thesis aims on two different analysis of the speech signal- the analysis of vowel polygons and the analysis of glottal pulses. By performed experiments, the doctoral thesis uncovers the possible usage of both fundamental analyses for psychological stress detection in speech. The analysis of glottal pulses in amplitude domain according to Top-To-Bottom criterion seems to be as the most effective with the combination of properly chosen classifier, which can be defined as language and phoneme independent way to stress recognition. All experiments were performed on developed Czech real stress database and some observations were also made on English database SUSAS. The variety of possibly effective ways of stress recognition in speech leads to approach very high recognition accuracy of their combination, or of their possible usage for detection of other speaker’s state, which has to be further tested and verified by appropriate databases.