3 research outputs found
Optimization of automatic speech emotion recognition systems
Osnov za uspešnu integraciju emocionalne inteligencije u sofisticirane sisteme veštačke inteligencije jeste pouzdano prepoznavanje emocionalnih stanja, pri čemu se paralingvistički sadržaj govora izdvaja kao posebno značajan nosilac informacija o emocionalnom stanju govornika. U ovom radu je sprovedena komparativna analiza obeležja govornog signala i klasifikatorskih metoda najčešće korišćenih u rešavanju zadatka automatskog prepoznavanja emocionalnih stanja govornika, a zatim su razmotrene mogućnosti popravke performansi sistema za automatsko prepoznavanje govornih emocija. Izvršeno je unapređenje diskretnih skrivenih Markovljevih modela upotrebom QQ krive za potrebe određivanja etalona vektorske kvantizacije, a razmotrena su i dodatna unapređenja modela. Ispitane su mogućnosti vernije reprezentacije govornog signala, pri čemu je analiza proširena na veliki broj obeležja iz različitih grupa. Formiranje velikih skupova obeležja nameće potrebu za redukcijom dimenzija, gde je pored poznatih metoda analizirana i alternativna metoda zasnovana na Fibonačijevom nizu brojeva. Na kraju su razmotrene mogućnosti integracije prednosti različitih pristupa u jedinstven sistem za automatsko prepoznavanje govornih emocija, tako da je predložena paralelna multiklasifikatorska struktura sa kombinatornim pravilom koje pored rezultata klasifikacije pojedinačnih klasifikatora ansambla koristi i informacije o karakteristikama klasifikatora. Takođe, dat je predlog automatskog formiranja ansambla klasifikatora proizvoljne veličine upotrebom redukcije dimenzija zasnovane na Fibonačijevom nizu brojevaThe basis for the successful integration of emotional intelligence into sophisticated systems of artificial intelligence is the reliable recognition of emotional states, with the paralinguistic content of speech standing out as a particularly significant carrier of information regarding the emotional state of the speaker. In this paper, a comparative analysis of speech signal features and classification methods most often used for solving the task of automatic recognition of speakers' emotional states is performed, after which the possibilities for improving the performances of the systems for automatic recognition of speech emotions are considered. Discrete hidden Markov models were improved using the QQ plot for the purpose of determining the codevectors for vector quantization, and additional models improvements were also considered. The possibilities for a more faithful representation of the speech signal were examined, whereby the analysis was extended to a large number of features from different groups. The formation of big sets of features imposes the need for dimensionality reduction, where an alternative method based on the Fibonacci sequence of numbers was analyzed, alongside known methods. Finally, the possibilities for integrating the advantages of different approaches into a single system for automatic recognition of speech emotions are considered, so that a parallel multiclassifier structure is proposed with a combinatorial rule, which, in addition to the classification results of individual ensemble classifiers, uses information about classifiers' characteristics. A proposal is also given for the automatic formation of an ensemble of classifiers of arbitrary size by using dimensionality reduction based on the Fibonacci sequence of numbers
On automatic emotion classification using acoustic features
In this thesis, we describe extensive experiments on the classification of emotions from speech using acoustic features. This area of research has important applications in human computer interaction. We have thoroughly reviewed the current literature and present our results on some of the contemporary emotional speech databases. The principal focus is on creating a large set of acoustic features, descriptive of different emotional states and finding methods for selecting a subset of best performing features by using feature selection methods. In this thesis we have looked at several traditional feature selection methods and propose a novel scheme which employs a preferential Borda voting strategy for ranking features. The comparative results show that our proposed scheme can strike a balance between accurate but computationally intensive wrapper methods and less accurate but computationally less intensive filter methods for feature selection. By using the selected features, several schemes for extending the binary classifiers to multiclass classification are tested. Some of these classifiers form serial combinations of binary classifiers while others use a hierarchical structure to perform this task. We describe a new hierarchical classification scheme, which we call Data-Driven Dimensional Emotion Classification (3DEC), whose decision hierarchy is based on non-metric multidimensional scaling (NMDS) of the data. This method of creating a hierarchical structure for the classification of emotion classes gives significant improvements over other methods tested. The NMDS representation of emotional speech data can be interpreted in terms of the well-known valence-arousal model of emotion. We find that this model does not givea particularly good fit to the data: although the arousal dimension can be identified easily, valence is not well represented in the transformed data. From the recognitionresults on these two dimensions, we conclude that valence and arousal dimensions are not orthogonal to each other. In the last part of this thesis, we deal with the very difficult but important topic of improving the generalisation capabilities of speech emotion recognition (SER) systems over different speakers and recording environments. This topic has been generally overlooked in the current research in this area. First we try the traditional methods used in automatic speech recognition (ASR) systems for improving the generalisation of SER in intra– and inter–database emotion classification. These traditional methods do improve the average accuracy of the emotion classifier. In this thesis, we identify these differences in the training and test data, due to speakers and acoustic environments, as a covariate shift. This shift is minimised by using importance weighting algorithms from the emerging field of transfer learning to guide the learning algorithm towards that training data which gives better representation of testing data. Our results show that importance weighting algorithms can be used to minimise the differences between the training and testing data. We also test the effectiveness of importance weighting algorithms on inter–database and cross-lingual emotion recognition. From these results, we draw conclusions about the universal nature of emotions across different languages
Recommended from our members
Automotive emotions: a human-centred approach towards the measurement and understanding of drivers' emotions and their triggers
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonThe automotive industry is facing significant technological and sociological shifts, calling for an improved understanding of driver and passenger behaviours, emotions and needs, and a transformation of the traditional
automotive design process. This research takes a human-centred approach to automotive research, investigating the users’ emotional states during automobile driving, with the goal to develop a framework for automotive emotion research, thus enabling the integration of technological advances into the driving environment. A literature review of human emotion and emotion in an automotive context was conducted, followed by three driving studies investigating emotion through Facial-Expression Analysis (FEA): An exploratory study investigated whether emotion elicitation can be applied in driving simulators, and if FEA can detect the emotions triggered. The results allowed confidence in the applicability of emotion elicitation to a lab-based environment to trigger emotional responses, and FEA to detect those. An on-road driving study was conducted in a natural setting to investigate whether natures and frequencies of emotion events could be automatically measured. The possibility of assigning triggers to those was investigated. Overall, 730 emotion events were detected during a total driving time of 440 minutes, and event triggers were assigned to 92% of the emotion events. A similar second on-road study was conducted in a partially controlled setting on a planned road circuit. In 840 minutes, 1947 emotion events were measured, and triggers were successfully assigned to 94% of those. The differences in natures, frequencies and causes of emotions on different road
types were investigated. Comparison of emotion events for different roads demonstrated substantial variances of natures, frequencies and triggers of emotions on different road types. The results showed that emotions play a significant role during automobile driving. The possibility of assigning triggers can be used to create a better understanding of causes of emotions in the automotive habitat. Both on-road studies were compared through statistical analysis to investigate influences of the different study settings. Certain conditions (e.g.
driving setting, social interaction) showed significant influence on emotions during driving. This research establishes and validates a methodology for the study of emotions and their causes in the driving environment through which systems and factors causing positive and negative emotional effects can be identified. The methodology and results can be applied to design and research processes, allowing the identification of issues and opportunities in current automotive design to address challenges of future automotive design. Suggested future research includes the investigation of a wider variety of road types and situations, testing with different automobiles and the combination of multiple measurement techniques