Search CORE

56,783 research outputs found

Novel Cascaded Gaussian Mixture Model-Deep Neural Network Classifier for Speaker Identification in Emotional Talking Environments

Author: Hamsa Shibani
Nassif Ali Bou
Shahin Ismail
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/10/2018
Field of study

This research is an effort to present an effective approach to enhance text-independent speaker identification performance in emotional talking environments based on novel classifier called cascaded Gaussian Mixture Model-Deep Neural Network (GMM-DNN). Our current work focuses on proposing, implementing and evaluating a new approach for speaker identification in emotional talking environments based on cascaded Gaussian Mixture Model-Deep Neural Network as a classifier. The results point out that the cascaded GMM-DNN classifier improves speaker identification performance at various emotions using two distinct speech databases: Emirati speech database (Arabic United Arab Emirates dataset) and Speech Under Simulated and Actual Stress (SUSAS) English dataset. The proposed classifier outperforms classical classifiers such as Multilayer Perceptron (MLP) and Support Vector Machine (SVM) in each dataset. Speaker identification performance that has been attained based on the cascaded GMM-DNN is similar to that acquired from subjective assessment by human listeners.Comment: 15 page

arXiv.org e-Print Archive

Sentiment Identification in Code-Mixed Social Media Text

Author: Das Dipankar
Ghosh Satanu
Ghosh Souvick
Publication venue
Publication date: 04/07/2017
Field of study

Sentiment analysis is the Natural Language Processing (NLP) task dealing with the detection and classification of sentiments in texts. While some tasks deal with identifying the presence of sentiment in the text (Subjectivity analysis), other tasks aim at determining the polarity of the text categorizing them as positive, negative and neutral. Whenever there is a presence of sentiment in the text, it has a source (people, group of people or any entity) and the sentiment is directed towards some entity, object, event or person. Sentiment analysis tasks aim to determine the subject, the target and the polarity or valence of the sentiment. In our work, we try to automatically extract sentiment (positive or negative) from Facebook posts using a machine learning approach.While some works have been done in code-mixed social media data and in sentiment analysis separately, our work is the first attempt (as of now) which aims at performing sentiment analysis of code-mixed social media text. We have used extensive pre-processing to remove noise from raw text. Multilayer Perceptron model has been used to determine the polarity of the sentiment. We have also developed the corpus for this task by manually labeling Facebook posts with their associated sentiments

arXiv.org e-Print Archive

Exploiting Synchronized Lyrics And Vocal Features For Music Emotion Detection

Author: Francia Simone
Olivastri Silvio
Parisi Loreto
Tavella Maria Stella
Publication venue
Publication date: 15/01/2019
Field of study

One of the key points in music recommendation is authoring engaging playlists according to sentiment and emotions. While previous works were mostly based on audio for music discovery and playlists generation, we take advantage of our synchronized lyrics dataset to combine text representations and music features in a novel way; we therefore introduce the Synchronized Lyrics Emotion Dataset. Unlike other approaches that randomly exploited the audio samples and the whole text, our data is split according to the temporal information provided by the synchronization between lyrics and audio. This work shows a comparison between text-based and audio-based deep learning classification models using different techniques from Natural Language Processing and Music Information Retrieval domains. From the experiments on audio we conclude that using vocals only, instead of the whole audio data improves the overall performances of the audio classifier. In the lyrics experiments we exploit the state-of-the-art word representations applied to the main Deep Learning architectures available in literature. In our benchmarks the results show how the Bilinear LSTM classifier with Attention based on fastText word embedding performs better than the CNN applied on audio.Comment: 8 pages, 5 figures, 9 table

arXiv.org e-Print Archive

SignsWorld; Deeping Into the Silence World and Hearing Its Signs (State of the Art)

Author: Asem A. S.
Elmonier Hamdy K.
Riad A. M.
Shohieb Samaa. M.
Publication venue
Publication date: 10/03/2012
Field of study

Automatic speech processing systems are employed more and more often in real environments. Although the underlying speech technology is mostly language independent, differences between languages with respect to their structure and grammar have substantial effect on the recognition systems performance. In this paper, we present a review of the latest developments in the sign language recognition research in general and in the Arabic sign language (ArSL) in specific. This paper also presents a general framework for improving the deaf community communication with the hearing people that is called SignsWorld. The overall goal of the SignsWorld project is to develop a vision-based technology for recognizing and translating continuous Arabic sign language ArSL.Comment: 20 pages, A state of art paper so it contains many reference

arXiv.org e-Print Archive

Talking Condition Recognition in Stressful and Emotional Talking Environments Based on CSPHMM2s

Author: Ba-Hutair Mohammed Nasser
Shahin Ismail
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/06/2017
Field of study

This work is aimed at exploiting Second-Order Circular Suprasegmental Hidden Markov Models (CSPHMM2s) as classifiers to enhance talking condition recognition in stressful and emotional talking environments (completely two separate environments). The stressful talking environment that has been used in this work uses Speech Under Simulated and Actual Stress (SUSAS) database, while the emotional talking environment uses Emotional Prosody Speech and Transcripts (EPST) database. The achieved results of this work using Mel-Frequency Cepstral Coefficients (MFCCs) demonstrate that CSPHMM2s outperform each of Hidden Markov Models (HMMs), Second-Order Circular Hidden Markov Models (CHMM2s), and Suprasegmental Hidden Markov Models (SPHMMs) in enhancing talking condition recognition in the stressful and emotional talking environments. The results also show that the performance of talking condition recognition in stressful talking environments leads that in emotional talking environments by 3.67% based on CSPHMM2s. Our results obtained in subjective evaluation by human judges fall within 2.14% and 3.08% of those obtained, respectively, in stressful and emotional talking environments based on CSPHMM2s

arXiv.org e-Print Archive

Annotating and Modeling Empathy in Spoken Conversations

Author: Alam Firoj
Danieli Morena
Riccardi Giuseppe
Publication venue: 'Elsevier BV'
Publication date: 29/12/2017
Field of study

Empathy, as defined in behavioral sciences, expresses the ability of human beings to recognize, understand and react to emotions, attitudes and beliefs of others. The lack of an operational definition of empathy makes it difficult to measure it. In this paper, we address two related problems in automatic affective behavior analysis: the design of the annotation protocol and the automatic recognition of empathy from spoken conversations. We propose and evaluate an annotation scheme for empathy inspired by the modal model of emotions. The annotation scheme was evaluated on a corpus of real-life, dyadic spoken conversations. In the context of behavioral analysis, we designed an automatic segmentation and classification system for empathy. Given the different speech and language levels of representation where empathy may be communicated, we investigated features derived from the lexical and acoustic spaces. The feature development process was designed to support both the fusion and automatic selection of relevant features from high dimensional space. The automatic classification system was evaluated on call center conversations where it showed significantly better performance than the baseline.Comment: Journal of Computer Speech and Languag

arXiv.org e-Print Archive

Multimodal Affect Recognition using Kinect

Author: Knapp Gerald
Patwardhan Amol
Publication venue
Publication date: 09/07/2016
Field of study

Affect (emotion) recognition has gained significant attention from researchers in the past decade. Emotion-aware computer systems and devices have many applications ranging from interactive robots, intelligent online tutor to emotion based navigation assistant. In this research data from multiple modalities such as face, head, hand, body and speech was utilized for affect recognition. The research used color and depth sensing device such as Kinect for facial feature extraction and tracking human body joints. Temporal features across multiple frames were used for affect recognition. Event driven decision level fusion was used to combine the results from each individual modality using majority voting to recognize the emotions. The study also implemented affect recognition by matching the features to the rule based emotion templates per modality. Experiments showed that multimodal affect recognition rates using combination of emotion templates and supervised learning were better compared to recognition rates based on supervised learning alone. Recognition rates obtained using temporal feature were higher compared to recognition rates obtained using position based features only.Comment: 9 pages, 2 tables, 1 figure, Peer reviewed in ACM TIS

arXiv.org e-Print Archive

Survey on RGB, 3D, Thermal, and Multimodal Approaches for Facial Expression Recognition: History, Trends, and Affect-related Applications

Author: Cohn Jeffrey F.
Corneanu Ciprian
Escalera Sergio
Oliu Marc
Publication venue
Publication date: 10/06/2016
Field of study

Facial expressions are an important way through which humans interact socially. Building a system capable of automatically recognizing facial expressions from images and video has been an intense field of study in recent years. Interpreting such expressions remains challenging and much research is needed about the way they relate to human affect. This paper presents a general overview of automatic RGB, 3D, thermal and multimodal facial expression analysis. We define a new taxonomy for the field, encompassing all steps from face detection to facial expression recognition, and describe and classify the state of the art methods accordingly. We also present the important datasets and the bench-marking of most influential methods. We conclude with a general discussion about trends, important questions and future lines of research

arXiv.org e-Print Archive

Speaker Identification in the Shouted Environment Using Suprasegmental Hidden Markov Models

Author: Shahin Ismail
Publication venue: 'Elsevier BV'
Publication date: 29/06/2017
Field of study

In this paper, Suprasegmental Hidden Markov Models (SPHMMs) have been used to enhance the recognition performance of text-dependent speaker identification in the shouted environment. Our speech database consists of two databases: our collected database and the Speech Under Simulated and Actual Stress (SUSAS) database. Our results show that SPHMMs significantly enhance speaker identification performance compared to Second-Order Circular Hidden Markov Models (CHMM2s) in the shouted environment. Using our collected database, speaker identification performance in this environment is 68% and 75% based on CHMM2s and SPHMMs respectively. Using the SUSAS database, speaker identification performance in the same environment is 71% and 79% based on CHMM2s and SPHMMs respectively

arXiv.org e-Print Archive

Deep Learning for Digital Text Analytics: Sentiment Analysis

Author: B Barathi Ganesh H
Kale Mandar
Kulkarni Gouri
Mankame Prachi
U Reshma
Publication venue
Publication date: 10/04/2018
Field of study

In today's scenario, imagining a world without negativity is something very unrealistic, as bad NEWS spreads more virally than good ones. Though it seems impractical in real life, this could be implemented by building a system using Machine Learning and Natural Language Processing techniques in identifying the news datum with negative shade and filter them by taking only the news with positive shade (good news) to the end user. In this work, around two lakhs datum have been trained and tested using a combination of rule-based and data driven approaches. VADER along with a filtration method has been used as an annotating tool followed by statistical Machine Learning approach that have used Document Term Matrix (representation) and Support Vector Machine (classification). Deep Learning algorithms then came into picture to make this system reliable (Doc2Vec) which finally ended up with Convolutional Neural Network(CNN) that yielded better results than the other experimented modules. It showed up a training accuracy of 96%, while a test accuracy of (internal and external news datum) above 85% was obtained.Comment: 8 page

arXiv.org e-Print Archive