15 research outputs found

    Innovative Approach to Detect Mental Disorder Using Multimodal Technique

    Get PDF
    The human can display their emotions through facial expressions. To achieve more effective human- computer interaction, the emotion recognize from human face could prove to be an invaluable tool. In this work the automatic facial recognition system is described with the help of video. The main aim is to focus on detecting the human face from the video and classify the emotions on the basis of facial features .There have been extensive studies of human facial expressions. These facial expressions are representing happiness, sadness, anger, fear, surprise and disgust. It including preliterate ones, and found much commonality in the expression and recognition of emotions on the face. Emotion detection from speech has many important applications. In human-computer based systems, emotion recognition systems provide users with improved services as per their emotions criteria. It is quite limited on body of work on detecting emotion in speech. The developers are still debating what features effect the emotion identification in speech. There is no particularity for the best algorithm for classifying emotion, and which emotions to class together

    Sentiment Analysis for Customer’s Reviews using Hybrid Approach

    Get PDF
    One of the greatest challenges to human-machine interaction is estimating the speaker’s emotion. The need is for clear, more accurate information about consumer preferences has led to increasing interest in high-level analysis of online media context. In this paper, I have proposed an approach for emotion recognition based on both speech and media content. Most of the existing approaches to sentiment analysis focus on audio and text sentiment. The novelty in this approach is the generation of text sentiments, audio sentiments and blend them to obtain better accuracy

    Hybrid Approach for Emotion Classification of Audio Conversation Based on Text and Speech Mining

    Get PDF
    AbstractOne of the greatest challenges in speech technology is estimating the speaker's emotion. Most of the existing approaches concentrate either on audio or text features. In this work, we propose a novel approach for emotion classification of audio conversation based on both speech and text. The novelty in this approach is in the choice of features and the generation of a single feature vector for classification. Our main intention is to increase the accuracy of emotion classification of speech by considering both audio and text features. In this work we use standard methods such as Natural Language Processing, Support Vector Machines, WordNet Affect and SentiWordNet. The dataset for this work have been taken from Semval -2007 and eNTERFACE’05 EMOTION Database

    Classification of stress based on speech features

    Get PDF
    Contemporary life is filled with challenges, hassles, deadlines, disappointments, and endless demands. The consequent of which might be stress. Stress has become a global phenomenon that is been experienced in our modern daily lives. Stress might play a significant role in psychological and/or behavioural disorders like anxiety or depression. Hence early detection of the signs and symptoms of stress is an antidote towards reducing its harmful effects and high cost of stress management efforts. This research work thereby presented Automatic Speech Recognition (ASR) technique to stress detection as a better alternative to other approaches such as chemical analysis, skin conductance, electrocardiograms that are obtrusive, intrusive, and also costly. Two set of voice data was recorded from ten Arabs students at Universiti Utara Malaysia (UUM) in neural and stressed mode. Speech features of fundamental, frequency (f0); formants (F1, F2, and F3), energy and Mel-Frequency Cepstral Coefficients (MFCC) were extracted and classified by K-nearest neighbour, Linear Discriminant Analysis and Artificial Neural Network. Result from average value of fundamental frequency reveals that stress is highly correlated with increase in fundamental frequency value. Of the three classifiers, K-nearest neighbor (KNN) performance is best followed by linear discriminant analysis (LDA) while artificial neural network (ANN) shows the least performance. Stress level classification into low, medium and high was done based of the classification result of KNN. This research shows the viability of ASR as better means of stress detection and classification

    A Comparison of Machine Learning Algorithms and Feature Sets for Automatic Vocal Emotion Recognition in Speech

    Get PDF
    Vocal emotion recognition (VER) in natural speech, often referred to as speech emotion recognition (SER), remains challenging for both humans and computers. Applied fields including clinical diagnosis and intervention, social interaction research or Human Computer Interaction (HCI) increasingly benefit from efficient VER algorithms. Several feature sets were used with machine-learning (ML) algorithms for discrete emotion classification. However, there is no consensus for which low-level-descriptors and classifiers are optimal. Therefore, we aimed to compare the performance of machine-learning algorithms with several different feature sets. Concretely, seven ML algorithms were compared on the Berlin Database of Emotional Speech: Multilayer Perceptron Neural Network (MLP), J48 Decision Tree (DT), Support Vector Machine with Sequential Minimal Optimization (SMO), Random Forest (RF), k-Nearest Neighbor (KNN), Simple Logistic Regression (LOG) and Multinomial Logistic Regression (MLR) with 10-fold cross validation using four openSMILE feature sets (i.e., IS-09, emobase, GeMAPS and eGeMAPS). Results indicated that SMO, MLP and LOG show better performance (reaching to 87.85%, 84.00% and 83.74% accuracies, respectively) compared to RF, DT, MLR and KNN (with minimum 73.46%, 53.08%, 70.65% and 58.69% accuracies, respectively). Overall, the emobase feature set performed best. We discuss the implications of these findings for applications in diagnosis, intervention or HCI

    Analisis Koefisien Cepstral Emosi Berdasarkan Suara

    Get PDF
    Abstract - The speech signal carries some sort of information, which consists of the intent to be conveyed, who speaks the information, and the emotional information that shows the emotional state of the utterance. One of the characteristics of human voice is the fundamental frequency. In this study the selection of features and methods of classification and recognition is important to recognize the emotional level (anger, sadness, fear, pleasure and neutral) contained in the dataset, this research proposes design through two main processes of training and introduction recognition). Experiments conducted using the Indonesian emotion voice dataset and the Mel-Frequency Cepstrum Coefficients (MFCC) algorithm were used to extract features from sound emotion. MFCC produces 13 cepstral coefficients of each of the sound emotion signals. This coefficient is used as an input of classification of emotional data from 250 data sampling

    Convolutional Neural Networks for Emotion Recognition

    Get PDF
    Konvoluční neuronové sítě se dnes používají v mnoha oblastech, především ale pro strojové učení, kde vykazují velkou úspěšnost. Tato práce nejprve představí existující frameworky, další algoritmy pro rozpoznávání a pak popisuje, jak probíhalo vytváření vlastní datové sady a trénink modelu pro rozpoznávání emocí. Tento model má úspěšnost klasifikace 60%. Model je následně využit pro získání statistik o emocích z filmových trailerů a z těchto statistik je sestaven model pro rozpoznávání žánrů, který je konečně použit v naší aplikaci pro určení žánru vstupního traileru s přesností až 47%.Convolutional neural networks are used for various tasks, but foremost in machine learning, in which they excel. This work is going to introduce some existing frameworks, other algorithms for recognition and then we describe the training dataset creation and the model for emotion recognition training process. Mentioned model has accuracy of 60%. It is used for emotion statistics retrieval from movie trailers. Model for genre recognition is created from those statistics and then finally used in our application for genre recognition of the input trailer, with best accuracy of 47%.

    On the development of an automatic voice pleasantness classification and intensity estimation system

    Get PDF
    In the last few years, the number of systems and devices that use voice based interaction has grown significantly. For a continued use of these systems, the interface must be reliable and pleasant in order to provide an optimal user experience. However there are currently very few studies that try to evaluate how pleasant is a voice from a perceptual point of view when the final application is a speech based interface. In this paper we present an objective definition for voice pleasantness based on the composition of a representative feature subset and a new automatic voice pleasantness classification and intensity estimation system. Our study is based on a database composed by European Portuguese female voices but the methodology can be extended to male voices or to other languages. In the objective performance evaluation the system achieved a 9.1% error rate for voice pleasantness classification and a 15.7% error rate for voice pleasantness intensity estimation.Work partially supported by ERDF funds, the Spanish Government (TEC2009-14094-C04-04), and Xunta de Galicia (CN2011/019, 2009/062

    MULTIVARIATE ANALYSIS FOR UNDERSTANDING COGNITIVE SPEECH PROCESSING

    Get PDF
    MULTIVARIATE ANALYSIS FOR UNDERSTANDING COGNITIVE SPEECH PROCESSIN

    Multi-Sensory Emotion Recognition with Speech and Facial Expression

    Get PDF
    Emotion plays an important role in human beings’ daily lives. Understanding emotions and recognizing how to react to others’ feelings are fundamental to engaging in successful social interactions. Currently, emotion recognition is not only significant in human beings’ daily lives, but also a hot topic in academic research, as new techniques such as emotion recognition from speech context inspires us as to how emotions are related to the content we are uttering. The demand and importance of emotion recognition have highly increased in many applications in recent years, such as video games, human computer interaction, cognitive computing, and affective computing. Emotion recognition can be done from many sources including text, speech, hand, and body gesture as well as facial expression. Presently, most of the emotion recognition methods only use one of these sources. The emotion of human beings changes every second and using a single way to process the emotion recognition may not reflect the emotion correctly. This research is motivated by the desire to understand and evaluate human beings’ emotion from multiple ways such as speech and facial expressions. In this dissertation, multi-sensory emotion recognition has been exploited. The proposed framework can recognize emotion from speech, facial expression, and both of them. There are three important parts in the design of the system: the facial emotion recognizer, the speech emotion recognizer, and the information fusion. The information fusion part uses the results from the speech emotion recognition and facial emotion recognition. Then, a novel weighted method is used to integrate the results, and a final decision of the emotion is given after the fusion. The experiments show that with the weighted fusion methods, the accuracy can be improved to an average of 3.66% compared to fusion without adding weight. The improvement of the recognition rate can reach 18.27% and 5.66% compared to the speech emotion recognition and facial expression recognition, respectively. By improving the emotion recognition accuracy, the proposed multi-sensory emotion recognition system can help to improve the naturalness of human computer interaction
    corecore