54 research outputs found
Speaker Verification in Emotional Talking Environments based on Third-Order Circular Suprasegmental Hidden Markov Model
Speaker verification accuracy in emotional talking environments is not high
as it is in neutral ones. This work aims at accepting or rejecting the claimed
speaker using his/her voice in emotional environments based on the Third-Order
Circular Suprasegmental Hidden Markov Model (CSPHMM3) as a classifier. An
Emirati-accented (Arabic) speech database with Mel-Frequency Cepstral
Coefficients as the extracted features has been used to evaluate our work. Our
results demonstrate that speaker verification accuracy based on CSPHMM3 is
greater than that based on the state-of-the-art classifiers and models such as
Gaussian Mixture Model (GMM), Support Vector Machine (SVM), and Vector
Quantization (VQ).Comment: 6 pages, accepted in The International Conference on Electrical and
Computing Technologies and Applications, 2019 (ICECTA 2019). arXiv admin
note: text overlap with arXiv:1903.0980
Emirati-Accented Speaker Identification in each of Neutral and Shouted Talking Environments
This work is devoted to capturing Emirati-accented speech database (Arabic
United Arab Emirates database) in each of neutral and shouted talking
environments in order to study and enhance text-independent Emirati-accented
speaker identification performance in shouted environment based on each of
First-Order Circular Suprasegmental Hidden Markov Models (CSPHMM1s),
Second-Order Circular Suprasegmental Hidden Markov Models (CSPHMM2s), and
Third-Order Circular Suprasegmental Hidden Markov Models (CSPHMM3s) as
classifiers. In this research, our database was collected from fifty Emirati
native speakers (twenty five per gender) uttering eight common Emirati
sentences in each of neutral and shouted talking environments. The extracted
features of our collected database are called Mel-Frequency Cepstral
Coefficients (MFCCs). Our results show that average Emirati-accented speaker
identification performance in neutral environment is 94.0%, 95.2%, and 95.9%
based on CSPHMM1s, CSPHMM2s, and CSPHMM3s, respectively. On the other hand, the
average performance in shouted environment is 51.3%, 55.5%, and 59.3% based,
respectively, on CSPHMM1s, CSPHMM2s, and CSPHMM3s. The achieved average speaker
identification performance in shouted environment based on CSPHMM3s is very
similar to that obtained in subjective assessment by human listeners.Comment: 14 pages, 3 figures. arXiv admin note: text overlap with
arXiv:1707.0068
Emotion Recognition based on Third-Order Circular Suprasegmental Hidden Markov Model
This work focuses on recognizing the unknown emotion based on the Third-Order
Circular Suprasegmental Hidden Markov Model (CSPHMM3) as a classifier. Our work
has been tested on Emotional Prosody Speech and Transcripts (EPST) database.
The extracted features of EPST database are Mel-Frequency Cepstral Coefficients
(MFCCs). Our results give average emotion recognition accuracy of 77.8% based
on the CSPHMM3. The results of this work demonstrate that CSPHMM3 is superior
to the Third-Order Hidden Markov Model (HMM3), Gaussian Mixture Model (GMM),
Support Vector Machine (SVM), and Vector Quantization (VQ) by 6.0%, 4.9%, 3.5%,
and 5.4%, respectively, for emotion recognition. The average emotion
recognition accuracy achieved based on the CSPHMM3 is comparable to that found
using subjective assessment by human judges.Comment: Accepted at The 2019 IEEE Jordan International Joint Conference on
Electrical Engineering and Information Technology (JEEIT), Jorda
Speaker Identification in a Shouted Talking Environment Based on Novel Third-Order Circular Suprasegmental Hidden Markov Models
It is well known that speaker identification yields very high performance in
a neutral talking environment, on the other hand, the performance has been
sharply declined in a shouted talking environment. This work aims at proposing,
implementing, and evaluating novel Third-Order Circular Suprasegmental Hidden
Markov Models (CSPHMM3s) to improve the low performance of text-independent
speaker identification in a shouted talking environment. CSPHMM3s possess
combined characteristics of: Circular Hidden Markov Models (CHMMs), Third-Order
Hidden Markov Models (HMM3s), and Suprasegmental Hidden Markov Models (SPHMMs).
Our results show that CSPHMM3s are superior to each of: First-Order
Left-to-Right Suprasegmental Hidden Markov Models (LTRSPHMM1s), Second-Order
Left-to-Right Suprasegmental Hidden Markov Models (LTRSPHMM2s), Third-Order
Left-to-Right Suprasegmental Hidden Markov Models (LTRSPHMM3s), First-Order
Circular Suprasegmental Hidden Markov Models (CSPHMM1s), and Second-Order
Circular Suprasegmental Hidden Markov Models (CSPHMM2s) in a shouted talking
environment. Using our collected speech database, average speaker
identification performance in a shouted talking environment based on
LTRSPHMM1s, LTRSPHMM2s, LTRSPHMM3s, CSPHMM1s, CSPHMM2s, and CSPHMM3s is 74.6%,
78.4%, 81.7%, 78.7%, 83.4%, and 85.8%, respectively. Speaker identification
performance that has been achieved based on CSPHMM3s is close to that attained
based on subjective assessment by human listeners.Comment: arXiv admin note: substantial text overlap with arXiv:1706.09722,
arXiv:1707.0013
Talking Condition Recognition in Stressful and Emotional Talking Environments Based on CSPHMM2s
This work is aimed at exploiting Second-Order Circular Suprasegmental Hidden
Markov Models (CSPHMM2s) as classifiers to enhance talking condition
recognition in stressful and emotional talking environments (completely two
separate environments). The stressful talking environment that has been used in
this work uses Speech Under Simulated and Actual Stress (SUSAS) database, while
the emotional talking environment uses Emotional Prosody Speech and Transcripts
(EPST) database. The achieved results of this work using Mel-Frequency Cepstral
Coefficients (MFCCs) demonstrate that CSPHMM2s outperform each of Hidden Markov
Models (HMMs), Second-Order Circular Hidden Markov Models (CHMM2s), and
Suprasegmental Hidden Markov Models (SPHMMs) in enhancing talking condition
recognition in the stressful and emotional talking environments. The results
also show that the performance of talking condition recognition in stressful
talking environments leads that in emotional talking environments by 3.67%
based on CSPHMM2s. Our results obtained in subjective evaluation by human
judges fall within 2.14% and 3.08% of those obtained, respectively, in
stressful and emotional talking environments based on CSPHMM2s
Emotion Recognition Using Speaker Cues
This research aims at identifying the unknown emotion using speaker cues. In
this study, we identify the unknown emotion using a two-stage framework. The
first stage focuses on identifying the speaker who uttered the unknown emotion,
while the next stage focuses on identifying the unknown emotion uttered by the
recognized speaker in the prior stage. This proposed framework has been
evaluated on an Arabic Emirati-accented speech database uttered by fifteen
speakers per gender. Mel-Frequency Cepstral Coefficients (MFCCs) have been used
as the extracted features and Hidden Markov Model (HMM) has been utilized as
the classifier in this work. Our findings demonstrate that emotion recognition
accuracy based on the two-stage framework is greater than that based on the
one-stage approach and the state-of-the-art classifiers and models such as
Gaussian Mixture Model (GMM), Support Vector Machine (SVM), and Vector
Quantization (VQ). The average emotion recognition accuracy based on the
two-stage approach is 67.5%, while the accuracy reaches to 61.4%, 63.3%, 64.5%,
and 61.5%, based on the one-stage approach, GMM, SVM, and VQ, respectively. The
achieved results based on the two-stage framework are very close to those
attained in subjective assessment by human listeners.Comment: 5 page
Studying and Enhancing Talking Condition Recognition in Stressful and Emotional Talking Environments Based on HMMs, CHMM2s and SPHMMs
The work of this research is devoted to studying and enhancing talking
condition recognition in stressful and emotional talking environments
(completely two separate environments) based on three different and separate
classifiers. The three classifiers are: Hidden Markov Models (HMMs),
Second-Order Circular Hidden Markov Models (CHMM2s) and Suprasegmental Hidden
Markov Models (SPHMMs). The stressful talking environments that have been used
in this work are composed of neutral, shouted, slow, loud, soft and fast
talking conditions, while the emotional talking environments are made up of
neutral, angry, sad, happy, disgust and fear emotions. The achieved results in
the current work show that SPHMMs lead each of HMMs and CHMM2s in improving
talking condition recognition in stressful and emotional talking environments.
The results also demonstrate that talking condition recognition in stressful
talking environments outperforms that in emotional talking environments by
2.7%, 1.8% and 3.3% based on HMMs, CHMM2s and SPHMMs, respectively. Based on
subjective assessment by human judges, the recognition performance of stressful
talking conditions leads that of emotional ones by 5.2%.Comment: arXiv admin note: substantial text overlap with arXiv:1706.0972
Employing both Gender and Emotion Cues to Enhance Speaker Identification Performance in Emotional Talking Environments
Speaker recognition performance in emotional talking environments is not as
high as it is in neutral talking environments. This work focuses on proposing,
implementing, and evaluating a new approach to enhance the performance in
emotional talking environments. The new proposed approach is based on
identifying the unknown speaker using both his/her gender and emotion cues.
Both Hidden Markov Models (HMMs) and Suprasegmental Hidden Markov Models
(SPHMMs) have been used as classifiers in this work. This approach has been
tested on our collected emotional speech database which is composed of six
emotions. The results of this work show that speaker identification performance
based on using both gender and emotion cues is higher than that based on using
gender cues only, emotion cues only, and neither gender nor emotion cues by
7.22%, 4.45%, and 19.56%, respectively. This work also shows that the optimum
speaker identification performance takes place when the classifiers are
completely biased towards suprasegmental models and no impact of acoustic
models in the emotional talking environments. The achieved average speaker
identification performance based on the new proposed approach falls within
2.35% of that obtained in subjective evaluation by human judges
Speaker Identification in each of the Neutral and Shouted Talking Environments based on Gender-Dependent Approach Using SPHMMs
It is well known that speaker identification performs extremely well in the
neutral talking environments; however, the identification performance is
declined sharply in the shouted talking environments. This work aims at
proposing, implementing and testing a new approach to enhance the declined
performance in the shouted talking environments. The new proposed approach is
based on gender-dependent speaker identification using Suprasegmental Hidden
Markov Models (SPHMMs) as classifiers. This proposed approach has been tested
on two different and separate speech databases: our collected database and the
Speech Under Simulated and Actual Stress (SUSAS) database. The results of this
work show that gender-dependent speaker identification based on SPHMMs
outperforms gender-independent speaker identification based on the same models
and gender-dependent speaker identification based on Hidden Markov Models
(HMMs) by about 6% and 8%, respectively. The results obtained based on the
proposed approach are close to those obtained in subjective evaluation by human
judges
Novel Cascaded Gaussian Mixture Model-Deep Neural Network Classifier for Speaker Identification in Emotional Talking Environments
This research is an effort to present an effective approach to enhance
text-independent speaker identification performance in emotional talking
environments based on novel classifier called cascaded Gaussian Mixture
Model-Deep Neural Network (GMM-DNN). Our current work focuses on proposing,
implementing and evaluating a new approach for speaker identification in
emotional talking environments based on cascaded Gaussian Mixture Model-Deep
Neural Network as a classifier. The results point out that the cascaded GMM-DNN
classifier improves speaker identification performance at various emotions
using two distinct speech databases: Emirati speech database (Arabic United
Arab Emirates dataset) and Speech Under Simulated and Actual Stress (SUSAS)
English dataset. The proposed classifier outperforms classical classifiers such
as Multilayer Perceptron (MLP) and Support Vector Machine (SVM) in each
dataset. Speaker identification performance that has been attained based on the
cascaded GMM-DNN is similar to that acquired from subjective assessment by
human listeners.Comment: 15 page
- …