15 research outputs found

    Automatic Speech Emotion Recognition Using Machine Learning

    Get PDF
    This chapter presents a comparative study of speech emotion recognition (SER) systems. Theoretical definition, categorization of affective state and the modalities of emotion expression are presented. To achieve this study, an SER system, based on different classifiers and different methods for features extraction, is developed. Mel-frequency cepstrum coefficients (MFCC) and modulation spectral (MS) features are extracted from the speech signals and used to train different classifiers. Feature selection (FS) was applied in order to seek for the most relevant feature subset. Several machine learning paradigms were used for the emotion classification task. A recurrent neural network (RNN) classifier is used first to classify seven emotions. Their performances are compared later to multivariate linear regression (MLR) and support vector machines (SVM) techniques, which are widely used in the field of emotion recognition for spoken audio signals. Berlin and Spanish databases are used as the experimental data set. This study shows that for Berlin database all classifiers achieve an accuracy of 83% when a speaker normalization (SN) and a feature selection are applied to the features. For Spanish database, the best accuracy (94 %) is achieved by RNN classifier without SN and with FS

    Analyse acoustique de la voix pour la détection des émotions du locuteur

    No full text
    The aim of this thesis is to propose a speech emotion recognition (SER) system for application in classroom. This system has been built up using novel features based on the amplitude and frequency (AM-FM) modulation model of speech signal. This model is based on the joint use of empirical mode decomposition (EMD) and the Teager-Kaiser energy operator (TKEO). In this system, the discrete (or categorical) emotion theory was chosen to represent the six basic emotions (sadness, anger, joy, disgust, fear and surprise) and neutral emotion.Automatic recognition has been optimized by finding the best combination of features, selecting the most relevant ones and comparing different classification approaches. Two reference speech emotional databases, in German and Spanish, were used to train and evaluate this system. A new database in French, more appropriate for the educational context was built, tested andvalidated.L'objectif de cette thèse est de proposer un système de reconnaissance automatique des émotions (RAE) par analyse de la voix pour une application dans un contexte pédagogique d'orchestration de classe. Ce système s'appuie sur l'extraction de nouvelles caractéristiques, par démodulation en amplitude et en fréquence, de la voix ; considérée comme un signal multi-composantes modulé en amplitude et en fréquence (AM-FM), non-stationnaire et issue d'un système non-linéaire. Cette démodulation est basée sur l’utilisation conjointe de la décomposition en modes empiriques (EMD) et de l’opérateur d’énergie de Teager-Kaiser (TKEO).Dans ce système, le modèle discret (ou catégoriel) a été retenu pour représenter les six émotions de base (la tristesse, la colère, la joie, le dégoût, la peur et la surprise) et l'émotion dite neutre. La reconnaissance automatique a été optimisée par la recherche de la meilleure combinaison de caractéristiques, la sélection des plus pertinentes et par comparaison de différentes approches de classification. Deux bases de données émotionnelles de référence, en allemand et en espagnol, ont servi à entrainer et évaluer ce système. Une nouvelle base de données en Français, plus appropriée pour le contexte pédagogique a été construite, testée et validée

    Detection and analysis of human emotions through voice

    No full text
    L'objectif de cette thèse est de proposer un système de reconnaissance automatique des émotions (RAE) par analyse de la voix pour une application dans un contexte pédagogique d'orchestration de classe. Ce système s'appuie sur l'extraction de nouvelles caractéristiques, par démodulation en amplitude et en fréquence, de la voix ; considérée comme un signal multi-composantes modulé en amplitude et en fréquence (AM-FM), non-stationnaire et issue d'un système non-linéaire. Cette démodulation est basée sur l’utilisation conjointe de la décomposition en modes empiriques (EMD) et de l’opérateur d’énergie de Teager-Kaiser (TKEO).Dans ce système, le modèle discret (ou catégoriel) a été retenu pour représenter les six émotions de base (la tristesse, la colère, la joie, le dégoût, la peur et la surprise) et l'émotion dite neutre. La reconnaissance automatique a été optimisée par la recherche de la meilleure combinaison de caractéristiques, la sélection des plus pertinentes et par comparaison de différentes approches de classification. Deux bases de données émotionnelles de référence, en allemand et en espagnol, ont servi à entrainer et évaluer ce système. Une nouvelle base de données en Français, plus appropriée pour le contexte pédagogique a été construite, testée et validée.The aim of this thesis is to propose a speech emotion recognition (SER) system for application in classroom. This system has been built up using novel features based on the amplitude and frequency (AM-FM) modulation model of speech signal. This model is based on the joint use of empirical mode decomposition (EMD) and the Teager-Kaiser energy operator (TKEO). In this system, the discrete (or categorical) emotion theory was chosen to represent the six basic emotions (sadness, anger, joy, disgust, fear and surprise) and neutral emotion.Automatic recognition has been optimized by finding the best combination of features, selecting the most relevant ones and comparing different classification approaches. Two reference speech emotional databases, in German and Spanish, were used to train and evaluate this system. A new database in French, more appropriate for the educational context was built, tested andvalidated

    Speech Emotion Recognition: Recurrent Neural Networks compared to SVM and Linear Regression

    No full text
    Proceedings of the 26th International Conference on Artificial Neural Networks, Alghero, Italy, September 11-14, 2017International audienceEmotion recognition in spoken dialogues has been gaining increasing interest all through current years. A speech emotion recognition (SER) is a challenging research area in the field of Human Computer Interaction (HCI). It refers to the ability of detection the current emotional state of a human being from his or her voice. SER has potentially wide applications, such as the interface with robots, banking, call centers, car board systems, computer games etc. In our research we are interested to how, emotion recognition, can top enhance the quality of teaching for both of classroom orchestration and E-learnning. Integration of SER into aided teaching system, can guide teacher to decide what subjects can be taught and must be able to develop strategies for managing emotions within the learning environment. In linguistic activity, from student's interaction and articulation, we can extract information about their emotional state. That is why learner's emotional state should be considered in the language classroom. In general, the SER is a computational task consisting of two major parts: feature extraction and emotion machine classification. The questions that arise here: What are the acoustic features needed for a most robust automatic recognition of a speaker's emotion? Which methods is most appropriate for classification? How the database used influence the recognition of emotion in speech

    Biocompatible titanate nanotubes with high loading capacity of genistein: cytotoxicity study and anti-migratory effect on U87-MG cancer cell lines

    No full text
    International audienceTitanate nanotubes (Ti-Nts) have proved to be a potential candidate for drug delivery due to their large surface change and higher cellular uptake as a direct consequence of their tubular shape. Ti-Nts were assessed for their safety, their kinetics of cellular uptake on U87-MG cell line and for genistein loading efficiency. No cytotoxic effect was observed under higher empty Ti-Nts concentrations up to 100 mu g mL(-1). The multiwalled tubular morphology was found to be an important parameter promoting high drug loading. The Ti-Nts could achieve higher genistein drug-loading content (25.2%) and entrapment efficiency (51.2%) leading to a controlled drug release as well as a higher cellular uptake of genistein-loaded- Ti-Nts which induces higher cytotoxicity and significant anti-migratory effect on U87-MG human glioblastoma astrocytoma, promising efficient antitumor activity
    corecore