56,783 research outputs found
Novel Cascaded Gaussian Mixture Model-Deep Neural Network Classifier for Speaker Identification in Emotional Talking Environments
This research is an effort to present an effective approach to enhance
text-independent speaker identification performance in emotional talking
environments based on novel classifier called cascaded Gaussian Mixture
Model-Deep Neural Network (GMM-DNN). Our current work focuses on proposing,
implementing and evaluating a new approach for speaker identification in
emotional talking environments based on cascaded Gaussian Mixture Model-Deep
Neural Network as a classifier. The results point out that the cascaded GMM-DNN
classifier improves speaker identification performance at various emotions
using two distinct speech databases: Emirati speech database (Arabic United
Arab Emirates dataset) and Speech Under Simulated and Actual Stress (SUSAS)
English dataset. The proposed classifier outperforms classical classifiers such
as Multilayer Perceptron (MLP) and Support Vector Machine (SVM) in each
dataset. Speaker identification performance that has been attained based on the
cascaded GMM-DNN is similar to that acquired from subjective assessment by
human listeners.Comment: 15 page
Sentiment Identification in Code-Mixed Social Media Text
Sentiment analysis is the Natural Language Processing (NLP) task dealing with
the detection and classification of sentiments in texts. While some tasks deal
with identifying the presence of sentiment in the text (Subjectivity analysis),
other tasks aim at determining the polarity of the text categorizing them as
positive, negative and neutral. Whenever there is a presence of sentiment in
the text, it has a source (people, group of people or any entity) and the
sentiment is directed towards some entity, object, event or person. Sentiment
analysis tasks aim to determine the subject, the target and the polarity or
valence of the sentiment. In our work, we try to automatically extract
sentiment (positive or negative) from Facebook posts using a machine learning
approach.While some works have been done in code-mixed social media data and in
sentiment analysis separately, our work is the first attempt (as of now) which
aims at performing sentiment analysis of code-mixed social media text. We have
used extensive pre-processing to remove noise from raw text. Multilayer
Perceptron model has been used to determine the polarity of the sentiment. We
have also developed the corpus for this task by manually labeling Facebook
posts with their associated sentiments
Exploiting Synchronized Lyrics And Vocal Features For Music Emotion Detection
One of the key points in music recommendation is authoring engaging playlists
according to sentiment and emotions. While previous works were mostly based on
audio for music discovery and playlists generation, we take advantage of our
synchronized lyrics dataset to combine text representations and music features
in a novel way; we therefore introduce the Synchronized Lyrics Emotion Dataset.
Unlike other approaches that randomly exploited the audio samples and the whole
text, our data is split according to the temporal information provided by the
synchronization between lyrics and audio. This work shows a comparison between
text-based and audio-based deep learning classification models using different
techniques from Natural Language Processing and Music Information Retrieval
domains. From the experiments on audio we conclude that using vocals only,
instead of the whole audio data improves the overall performances of the audio
classifier. In the lyrics experiments we exploit the state-of-the-art word
representations applied to the main Deep Learning architectures available in
literature. In our benchmarks the results show how the Bilinear LSTM classifier
with Attention based on fastText word embedding performs better than the CNN
applied on audio.Comment: 8 pages, 5 figures, 9 table
SignsWorld; Deeping Into the Silence World and Hearing Its Signs (State of the Art)
Automatic speech processing systems are employed more and more often in real
environments. Although the underlying speech technology is mostly language
independent, differences between languages with respect to their structure and
grammar have substantial effect on the recognition systems performance. In this
paper, we present a review of the latest developments in the sign language
recognition research in general and in the Arabic sign language (ArSL) in
specific. This paper also presents a general framework for improving the deaf
community communication with the hearing people that is called SignsWorld. The
overall goal of the SignsWorld project is to develop a vision-based technology
for recognizing and translating continuous Arabic sign language ArSL.Comment: 20 pages, A state of art paper so it contains many reference
Talking Condition Recognition in Stressful and Emotional Talking Environments Based on CSPHMM2s
This work is aimed at exploiting Second-Order Circular Suprasegmental Hidden
Markov Models (CSPHMM2s) as classifiers to enhance talking condition
recognition in stressful and emotional talking environments (completely two
separate environments). The stressful talking environment that has been used in
this work uses Speech Under Simulated and Actual Stress (SUSAS) database, while
the emotional talking environment uses Emotional Prosody Speech and Transcripts
(EPST) database. The achieved results of this work using Mel-Frequency Cepstral
Coefficients (MFCCs) demonstrate that CSPHMM2s outperform each of Hidden Markov
Models (HMMs), Second-Order Circular Hidden Markov Models (CHMM2s), and
Suprasegmental Hidden Markov Models (SPHMMs) in enhancing talking condition
recognition in the stressful and emotional talking environments. The results
also show that the performance of talking condition recognition in stressful
talking environments leads that in emotional talking environments by 3.67%
based on CSPHMM2s. Our results obtained in subjective evaluation by human
judges fall within 2.14% and 3.08% of those obtained, respectively, in
stressful and emotional talking environments based on CSPHMM2s
Annotating and Modeling Empathy in Spoken Conversations
Empathy, as defined in behavioral sciences, expresses the ability of human
beings to recognize, understand and react to emotions, attitudes and beliefs of
others. The lack of an operational definition of empathy makes it difficult to
measure it. In this paper, we address two related problems in automatic
affective behavior analysis: the design of the annotation protocol and the
automatic recognition of empathy from spoken conversations. We propose and
evaluate an annotation scheme for empathy inspired by the modal model of
emotions. The annotation scheme was evaluated on a corpus of real-life, dyadic
spoken conversations. In the context of behavioral analysis, we designed an
automatic segmentation and classification system for empathy. Given the
different speech and language levels of representation where empathy may be
communicated, we investigated features derived from the lexical and acoustic
spaces. The feature development process was designed to support both the fusion
and automatic selection of relevant features from high dimensional space. The
automatic classification system was evaluated on call center conversations
where it showed significantly better performance than the baseline.Comment: Journal of Computer Speech and Languag
Multimodal Affect Recognition using Kinect
Affect (emotion) recognition has gained significant attention from
researchers in the past decade. Emotion-aware computer systems and devices have
many applications ranging from interactive robots, intelligent online tutor to
emotion based navigation assistant. In this research data from multiple
modalities such as face, head, hand, body and speech was utilized for affect
recognition. The research used color and depth sensing device such as Kinect
for facial feature extraction and tracking human body joints. Temporal features
across multiple frames were used for affect recognition. Event driven decision
level fusion was used to combine the results from each individual modality
using majority voting to recognize the emotions. The study also implemented
affect recognition by matching the features to the rule based emotion templates
per modality. Experiments showed that multimodal affect recognition rates using
combination of emotion templates and supervised learning were better compared
to recognition rates based on supervised learning alone. Recognition rates
obtained using temporal feature were higher compared to recognition rates
obtained using position based features only.Comment: 9 pages, 2 tables, 1 figure, Peer reviewed in ACM TIS
Survey on RGB, 3D, Thermal, and Multimodal Approaches for Facial Expression Recognition: History, Trends, and Affect-related Applications
Facial expressions are an important way through which humans interact
socially. Building a system capable of automatically recognizing facial
expressions from images and video has been an intense field of study in recent
years. Interpreting such expressions remains challenging and much research is
needed about the way they relate to human affect. This paper presents a general
overview of automatic RGB, 3D, thermal and multimodal facial expression
analysis. We define a new taxonomy for the field, encompassing all steps from
face detection to facial expression recognition, and describe and classify the
state of the art methods accordingly. We also present the important datasets
and the bench-marking of most influential methods. We conclude with a general
discussion about trends, important questions and future lines of research
Speaker Identification in the Shouted Environment Using Suprasegmental Hidden Markov Models
In this paper, Suprasegmental Hidden Markov Models (SPHMMs) have been used to
enhance the recognition performance of text-dependent speaker identification in
the shouted environment. Our speech database consists of two databases: our
collected database and the Speech Under Simulated and Actual Stress (SUSAS)
database. Our results show that SPHMMs significantly enhance speaker
identification performance compared to Second-Order Circular Hidden Markov
Models (CHMM2s) in the shouted environment. Using our collected database,
speaker identification performance in this environment is 68% and 75% based on
CHMM2s and SPHMMs respectively. Using the SUSAS database, speaker
identification performance in the same environment is 71% and 79% based on
CHMM2s and SPHMMs respectively
Deep Learning for Digital Text Analytics: Sentiment Analysis
In today's scenario, imagining a world without negativity is something very
unrealistic, as bad NEWS spreads more virally than good ones. Though it seems
impractical in real life, this could be implemented by building a system using
Machine Learning and Natural Language Processing techniques in identifying the
news datum with negative shade and filter them by taking only the news with
positive shade (good news) to the end user. In this work, around two lakhs
datum have been trained and tested using a combination of rule-based and data
driven approaches. VADER along with a filtration method has been used as an
annotating tool followed by statistical Machine Learning approach that have
used Document Term Matrix (representation) and Support Vector Machine
(classification). Deep Learning algorithms then came into picture to make this
system reliable (Doc2Vec) which finally ended up with Convolutional Neural
Network(CNN) that yielded better results than the other experimented modules.
It showed up a training accuracy of 96%, while a test accuracy of (internal and
external news datum) above 85% was obtained.Comment: 8 page
- …