3 research outputs found
Emotion Recognition based on Third-Order Circular Suprasegmental Hidden Markov Model
This work focuses on recognizing the unknown emotion based on the Third-Order
Circular Suprasegmental Hidden Markov Model (CSPHMM3) as a classifier. Our work
has been tested on Emotional Prosody Speech and Transcripts (EPST) database.
The extracted features of EPST database are Mel-Frequency Cepstral Coefficients
(MFCCs). Our results give average emotion recognition accuracy of 77.8% based
on the CSPHMM3. The results of this work demonstrate that CSPHMM3 is superior
to the Third-Order Hidden Markov Model (HMM3), Gaussian Mixture Model (GMM),
Support Vector Machine (SVM), and Vector Quantization (VQ) by 6.0%, 4.9%, 3.5%,
and 5.4%, respectively, for emotion recognition. The average emotion
recognition accuracy achieved based on the CSPHMM3 is comparable to that found
using subjective assessment by human judges.Comment: Accepted at The 2019 IEEE Jordan International Joint Conference on
Electrical Engineering and Information Technology (JEEIT), Jorda
Emotion Recognition Using Speaker Cues
This research aims at identifying the unknown emotion using speaker cues. In
this study, we identify the unknown emotion using a two-stage framework. The
first stage focuses on identifying the speaker who uttered the unknown emotion,
while the next stage focuses on identifying the unknown emotion uttered by the
recognized speaker in the prior stage. This proposed framework has been
evaluated on an Arabic Emirati-accented speech database uttered by fifteen
speakers per gender. Mel-Frequency Cepstral Coefficients (MFCCs) have been used
as the extracted features and Hidden Markov Model (HMM) has been utilized as
the classifier in this work. Our findings demonstrate that emotion recognition
accuracy based on the two-stage framework is greater than that based on the
one-stage approach and the state-of-the-art classifiers and models such as
Gaussian Mixture Model (GMM), Support Vector Machine (SVM), and Vector
Quantization (VQ). The average emotion recognition accuracy based on the
two-stage approach is 67.5%, while the accuracy reaches to 61.4%, 63.3%, 64.5%,
and 61.5%, based on the one-stage approach, GMM, SVM, and VQ, respectively. The
achieved results based on the two-stage framework are very close to those
attained in subjective assessment by human listeners.Comment: 5 page
Depression Severity Estimation from Multiple Modalities
Depression is a major debilitating disorder which can affect people from all
ages. With a continuous increase in the number of annual cases of depression,
there is a need to develop automatic techniques for the detection of the
presence and extent of depression. In this AVEC challenge we explore different
modalities (speech, language and visual features extracted from face) to design
and develop automatic methods for the detection of depression. In psychology
literature, the PHQ-8 questionnaire is well established as a tool for measuring
the severity of depression. In this paper we aim to automatically predict the
PHQ-8 scores from features extracted from the different modalities. We show
that visual features extracted from facial landmarks obtain the best
performance in terms of estimating the PHQ-8 results with a mean absolute error
(MAE) of 4.66 on the development set. Behavioral characteristics from speech
provide an MAE of 4.73. Language features yield a slightly higher MAE of 5.17.
When switching to the test set, our Turn Features derived from audio
transcriptions achieve the best performance, scoring an MAE of 4.11
(corresponding to an RMSE of 4.94), which makes our system the winner of the
AVEC 2017 depression sub-challenge.Comment: 8 pages, 1 figur