15 research outputs found
Speaker Identification in a Shouted Talking Environment Based on Novel Third-Order Circular Suprasegmental Hidden Markov Models
It is well known that speaker identification yields very high performance in
a neutral talking environment, on the other hand, the performance has been
sharply declined in a shouted talking environment. This work aims at proposing,
implementing, and evaluating novel Third-Order Circular Suprasegmental Hidden
Markov Models (CSPHMM3s) to improve the low performance of text-independent
speaker identification in a shouted talking environment. CSPHMM3s possess
combined characteristics of: Circular Hidden Markov Models (CHMMs), Third-Order
Hidden Markov Models (HMM3s), and Suprasegmental Hidden Markov Models (SPHMMs).
Our results show that CSPHMM3s are superior to each of: First-Order
Left-to-Right Suprasegmental Hidden Markov Models (LTRSPHMM1s), Second-Order
Left-to-Right Suprasegmental Hidden Markov Models (LTRSPHMM2s), Third-Order
Left-to-Right Suprasegmental Hidden Markov Models (LTRSPHMM3s), First-Order
Circular Suprasegmental Hidden Markov Models (CSPHMM1s), and Second-Order
Circular Suprasegmental Hidden Markov Models (CSPHMM2s) in a shouted talking
environment. Using our collected speech database, average speaker
identification performance in a shouted talking environment based on
LTRSPHMM1s, LTRSPHMM2s, LTRSPHMM3s, CSPHMM1s, CSPHMM2s, and CSPHMM3s is 74.6%,
78.4%, 81.7%, 78.7%, 83.4%, and 85.8%, respectively. Speaker identification
performance that has been achieved based on CSPHMM3s is close to that attained
based on subjective assessment by human listeners.Comment: arXiv admin note: substantial text overlap with arXiv:1706.09722,
arXiv:1707.0013
Emirati-Accented Speaker Identification in each of Neutral and Shouted Talking Environments
This work is devoted to capturing Emirati-accented speech database (Arabic
United Arab Emirates database) in each of neutral and shouted talking
environments in order to study and enhance text-independent Emirati-accented
speaker identification performance in shouted environment based on each of
First-Order Circular Suprasegmental Hidden Markov Models (CSPHMM1s),
Second-Order Circular Suprasegmental Hidden Markov Models (CSPHMM2s), and
Third-Order Circular Suprasegmental Hidden Markov Models (CSPHMM3s) as
classifiers. In this research, our database was collected from fifty Emirati
native speakers (twenty five per gender) uttering eight common Emirati
sentences in each of neutral and shouted talking environments. The extracted
features of our collected database are called Mel-Frequency Cepstral
Coefficients (MFCCs). Our results show that average Emirati-accented speaker
identification performance in neutral environment is 94.0%, 95.2%, and 95.9%
based on CSPHMM1s, CSPHMM2s, and CSPHMM3s, respectively. On the other hand, the
average performance in shouted environment is 51.3%, 55.5%, and 59.3% based,
respectively, on CSPHMM1s, CSPHMM2s, and CSPHMM3s. The achieved average speaker
identification performance in shouted environment based on CSPHMM3s is very
similar to that obtained in subjective assessment by human listeners.Comment: 14 pages, 3 figures. arXiv admin note: text overlap with
arXiv:1707.0068
Talking Condition Recognition in Stressful and Emotional Talking Environments Based on CSPHMM2s
This work is aimed at exploiting Second-Order Circular Suprasegmental Hidden
Markov Models (CSPHMM2s) as classifiers to enhance talking condition
recognition in stressful and emotional talking environments (completely two
separate environments). The stressful talking environment that has been used in
this work uses Speech Under Simulated and Actual Stress (SUSAS) database, while
the emotional talking environment uses Emotional Prosody Speech and Transcripts
(EPST) database. The achieved results of this work using Mel-Frequency Cepstral
Coefficients (MFCCs) demonstrate that CSPHMM2s outperform each of Hidden Markov
Models (HMMs), Second-Order Circular Hidden Markov Models (CHMM2s), and
Suprasegmental Hidden Markov Models (SPHMMs) in enhancing talking condition
recognition in the stressful and emotional talking environments. The results
also show that the performance of talking condition recognition in stressful
talking environments leads that in emotional talking environments by 3.67%
based on CSPHMM2s. Our results obtained in subjective evaluation by human
judges fall within 2.14% and 3.08% of those obtained, respectively, in
stressful and emotional talking environments based on CSPHMM2s
Emotion Recognition based on Third-Order Circular Suprasegmental Hidden Markov Model
This work focuses on recognizing the unknown emotion based on the Third-Order
Circular Suprasegmental Hidden Markov Model (CSPHMM3) as a classifier. Our work
has been tested on Emotional Prosody Speech and Transcripts (EPST) database.
The extracted features of EPST database are Mel-Frequency Cepstral Coefficients
(MFCCs). Our results give average emotion recognition accuracy of 77.8% based
on the CSPHMM3. The results of this work demonstrate that CSPHMM3 is superior
to the Third-Order Hidden Markov Model (HMM3), Gaussian Mixture Model (GMM),
Support Vector Machine (SVM), and Vector Quantization (VQ) by 6.0%, 4.9%, 3.5%,
and 5.4%, respectively, for emotion recognition. The average emotion
recognition accuracy achieved based on the CSPHMM3 is comparable to that found
using subjective assessment by human judges.Comment: Accepted at The 2019 IEEE Jordan International Joint Conference on
Electrical Engineering and Information Technology (JEEIT), Jorda
Speaker Identification in Shouted Talking Environments Based on Novel Third-Order Hidden Markov Models
In this work we propose, implement, and evaluate novel models called
Third-Order Hidden Markov Models (HMM3s) to enhance low performance of
text-independent speaker identification in shouted talking environments. The
proposed models have been tested on our collected speech database using
Mel-Frequency Cepstral Coefficients (MFCCs). Our results demonstrate that HMM3s
significantly improve speaker identification performance in such talking
environments by 11.3% and 166.7% compared to second-order hidden Markov models
(HMM2s) and first-order hidden Markov models (HMM1s), respectively. The
achieved results based on the proposed models are close to those obtained in
subjective assessment by human listeners.Comment: The 4th International Conference on Audio, Language and Image
Processing (ICALIP2014), Shanghai, China, 201
Speaker Identification in each of the Neutral and Shouted Talking Environments based on Gender-Dependent Approach Using SPHMMs
It is well known that speaker identification performs extremely well in the
neutral talking environments; however, the identification performance is
declined sharply in the shouted talking environments. This work aims at
proposing, implementing and testing a new approach to enhance the declined
performance in the shouted talking environments. The new proposed approach is
based on gender-dependent speaker identification using Suprasegmental Hidden
Markov Models (SPHMMs) as classifiers. This proposed approach has been tested
on two different and separate speech databases: our collected database and the
Speech Under Simulated and Actual Stress (SUSAS) database. The results of this
work show that gender-dependent speaker identification based on SPHMMs
outperforms gender-independent speaker identification based on the same models
and gender-dependent speaker identification based on Hidden Markov Models
(HMMs) by about 6% and 8%, respectively. The results obtained based on the
proposed approach are close to those obtained in subjective evaluation by human
judges
Speaker Verification in Emotional Talking Environments based on Third-Order Circular Suprasegmental Hidden Markov Model
Speaker verification accuracy in emotional talking environments is not high
as it is in neutral ones. This work aims at accepting or rejecting the claimed
speaker using his/her voice in emotional environments based on the Third-Order
Circular Suprasegmental Hidden Markov Model (CSPHMM3) as a classifier. An
Emirati-accented (Arabic) speech database with Mel-Frequency Cepstral
Coefficients as the extracted features has been used to evaluate our work. Our
results demonstrate that speaker verification accuracy based on CSPHMM3 is
greater than that based on the state-of-the-art classifiers and models such as
Gaussian Mixture Model (GMM), Support Vector Machine (SVM), and Vector
Quantization (VQ).Comment: 6 pages, accepted in The International Conference on Electrical and
Computing Technologies and Applications, 2019 (ICECTA 2019). arXiv admin
note: text overlap with arXiv:1903.0980
Emotion Recognition Using Speaker Cues
This research aims at identifying the unknown emotion using speaker cues. In
this study, we identify the unknown emotion using a two-stage framework. The
first stage focuses on identifying the speaker who uttered the unknown emotion,
while the next stage focuses on identifying the unknown emotion uttered by the
recognized speaker in the prior stage. This proposed framework has been
evaluated on an Arabic Emirati-accented speech database uttered by fifteen
speakers per gender. Mel-Frequency Cepstral Coefficients (MFCCs) have been used
as the extracted features and Hidden Markov Model (HMM) has been utilized as
the classifier in this work. Our findings demonstrate that emotion recognition
accuracy based on the two-stage framework is greater than that based on the
one-stage approach and the state-of-the-art classifiers and models such as
Gaussian Mixture Model (GMM), Support Vector Machine (SVM), and Vector
Quantization (VQ). The average emotion recognition accuracy based on the
two-stage approach is 67.5%, while the accuracy reaches to 61.4%, 63.3%, 64.5%,
and 61.5%, based on the one-stage approach, GMM, SVM, and VQ, respectively. The
achieved results based on the two-stage framework are very close to those
attained in subjective assessment by human listeners.Comment: 5 page
Employing both Gender and Emotion Cues to Enhance Speaker Identification Performance in Emotional Talking Environments
Speaker recognition performance in emotional talking environments is not as
high as it is in neutral talking environments. This work focuses on proposing,
implementing, and evaluating a new approach to enhance the performance in
emotional talking environments. The new proposed approach is based on
identifying the unknown speaker using both his/her gender and emotion cues.
Both Hidden Markov Models (HMMs) and Suprasegmental Hidden Markov Models
(SPHMMs) have been used as classifiers in this work. This approach has been
tested on our collected emotional speech database which is composed of six
emotions. The results of this work show that speaker identification performance
based on using both gender and emotion cues is higher than that based on using
gender cues only, emotion cues only, and neither gender nor emotion cues by
7.22%, 4.45%, and 19.56%, respectively. This work also shows that the optimum
speaker identification performance takes place when the classifiers are
completely biased towards suprasegmental models and no impact of acoustic
models in the emotional talking environments. The achieved average speaker
identification performance based on the new proposed approach falls within
2.35% of that obtained in subjective evaluation by human judges
Emirati-Accented Speaker Identification in Stressful Talking Conditions
This research is dedicated to improving text-independent Emirati-accented
speaker identification performance in stressful talking conditions using three
distinct classifiers: First-Order Hidden Markov Models (HMM1s), Second-Order
Hidden Markov Models (HMM2s), and Third-Order Hidden Markov Models (HMM3s). The
database that has been used in this work was collected from 25 per gender
Emirati native speakers uttering eight widespread Emirati sentences in each of
neutral, shouted, slow, loud, soft, and fast talking conditions. The extracted
features of the captured database are called Mel-Frequency Cepstral
Coefficients (MFCCs). Based on HMM1s, HMM2s, and HMM3s, average
Emirati-accented speaker identification accuracy in stressful conditions is
58.6%, 61.1%, and 65.0%, respectively. The achieved average speaker
identification accuracy in stressful conditions based on HMM3s is so similar to
that attained in subjective assessment by human listeners.Comment: 6 pages, this work has been accepted in The International Conference
on Electrical and Computing Technologies and Applications, 2019 (ICECTA 2019