5,746 research outputs found
Talking Condition Identification Using Second-Order Hidden Markov Models
This work focuses on enhancing the performance of text-dependent and
speaker-dependent talking condition identification systems using second-order
hidden Markov models (HMM2s). Our results show that the talking condition
identification performance based on HMM2s has been improved significantly
compared to first-order hidden Markov models (HMM1s). Our talking conditions in
this work are neutral, shouted, loud, angry, happy, and fear.Comment: 3rd International Conference on Information & Communication
Technologies: from Theory to Applications, Damascus, Syria, 2008. arXiv admin
note: text overlap with arXiv:1706.09691, arXiv:1706.0971
Enhancing speaker identification performance under the shouted talking condition using second-order circular hidden Markov models
It is known that the performance of speaker identification systems is high
under the neutral talking condition; however, the performance deteriorates
under the shouted talking condition. In this paper, second-order circular
hidden Markov models (CHMM2s) have been proposed and implemented to enhance the
performance of isolated-word text-dependent speaker identification systems
under the shouted talking condition. Our results show that CHMM2s significantly
improve speaker identification performance under such a condition compared to
the first-order left-to-right hidden Markov models (LTRHMM1s), second-order
left-to-right hidden Markov models (LTRHMM2s), and the first-order circular
hidden Markov models (CHMM1s). Under the shouted talking condition, our results
show that the average speaker identification performance is 23% based on
LTRHMM1s, 59% based on LTRHMM2s, and 60% based on CHMM1s. On the other hand,
the average speaker identification performance under the same talking condition
based on CHMM2s is 72%
Speaker Identification in a Shouted Talking Environment Based on Novel Third-Order Circular Suprasegmental Hidden Markov Models
It is well known that speaker identification yields very high performance in
a neutral talking environment, on the other hand, the performance has been
sharply declined in a shouted talking environment. This work aims at proposing,
implementing, and evaluating novel Third-Order Circular Suprasegmental Hidden
Markov Models (CSPHMM3s) to improve the low performance of text-independent
speaker identification in a shouted talking environment. CSPHMM3s possess
combined characteristics of: Circular Hidden Markov Models (CHMMs), Third-Order
Hidden Markov Models (HMM3s), and Suprasegmental Hidden Markov Models (SPHMMs).
Our results show that CSPHMM3s are superior to each of: First-Order
Left-to-Right Suprasegmental Hidden Markov Models (LTRSPHMM1s), Second-Order
Left-to-Right Suprasegmental Hidden Markov Models (LTRSPHMM2s), Third-Order
Left-to-Right Suprasegmental Hidden Markov Models (LTRSPHMM3s), First-Order
Circular Suprasegmental Hidden Markov Models (CSPHMM1s), and Second-Order
Circular Suprasegmental Hidden Markov Models (CSPHMM2s) in a shouted talking
environment. Using our collected speech database, average speaker
identification performance in a shouted talking environment based on
LTRSPHMM1s, LTRSPHMM2s, LTRSPHMM3s, CSPHMM1s, CSPHMM2s, and CSPHMM3s is 74.6%,
78.4%, 81.7%, 78.7%, 83.4%, and 85.8%, respectively. Speaker identification
performance that has been achieved based on CSPHMM3s is close to that attained
based on subjective assessment by human listeners.Comment: arXiv admin note: substantial text overlap with arXiv:1706.09722,
arXiv:1707.0013
Speaker Identification in Shouted Talking Environments Based on Novel Third-Order Hidden Markov Models
In this work we propose, implement, and evaluate novel models called
Third-Order Hidden Markov Models (HMM3s) to enhance low performance of
text-independent speaker identification in shouted talking environments. The
proposed models have been tested on our collected speech database using
Mel-Frequency Cepstral Coefficients (MFCCs). Our results demonstrate that HMM3s
significantly improve speaker identification performance in such talking
environments by 11.3% and 166.7% compared to second-order hidden Markov models
(HMM2s) and first-order hidden Markov models (HMM1s), respectively. The
achieved results based on the proposed models are close to those obtained in
subjective assessment by human listeners.Comment: The 4th International Conference on Audio, Language and Image
Processing (ICALIP2014), Shanghai, China, 201
Speaker Identification in the Shouted Environment Using Suprasegmental Hidden Markov Models
In this paper, Suprasegmental Hidden Markov Models (SPHMMs) have been used to
enhance the recognition performance of text-dependent speaker identification in
the shouted environment. Our speech database consists of two databases: our
collected database and the Speech Under Simulated and Actual Stress (SUSAS)
database. Our results show that SPHMMs significantly enhance speaker
identification performance compared to Second-Order Circular Hidden Markov
Models (CHMM2s) in the shouted environment. Using our collected database,
speaker identification performance in this environment is 68% and 75% based on
CHMM2s and SPHMMs respectively. Using the SUSAS database, speaker
identification performance in the same environment is 71% and 79% based on
CHMM2s and SPHMMs respectively
Talking Condition Recognition in Stressful and Emotional Talking Environments Based on CSPHMM2s
This work is aimed at exploiting Second-Order Circular Suprasegmental Hidden
Markov Models (CSPHMM2s) as classifiers to enhance talking condition
recognition in stressful and emotional talking environments (completely two
separate environments). The stressful talking environment that has been used in
this work uses Speech Under Simulated and Actual Stress (SUSAS) database, while
the emotional talking environment uses Emotional Prosody Speech and Transcripts
(EPST) database. The achieved results of this work using Mel-Frequency Cepstral
Coefficients (MFCCs) demonstrate that CSPHMM2s outperform each of Hidden Markov
Models (HMMs), Second-Order Circular Hidden Markov Models (CHMM2s), and
Suprasegmental Hidden Markov Models (SPHMMs) in enhancing talking condition
recognition in the stressful and emotional talking environments. The results
also show that the performance of talking condition recognition in stressful
talking environments leads that in emotional talking environments by 3.67%
based on CSPHMM2s. Our results obtained in subjective evaluation by human
judges fall within 2.14% and 3.08% of those obtained, respectively, in
stressful and emotional talking environments based on CSPHMM2s
Using Second-Order Hidden Markov Model to Improve Speaker Identification Recognition Performance under Neutral Condition
In this paper, second-order hidden Markov model (HMM2) has been used and
implemented to improve the recognition performance of text-dependent speaker
identification systems under neutral talking condition. Our results show that
HMM2 improves the recognition performance under neutral talking condition
compared to the first-order hidden Markov model (HMM1). The recognition
performance has been improved by 9%
Emotion Recognition based on Third-Order Circular Suprasegmental Hidden Markov Model
This work focuses on recognizing the unknown emotion based on the Third-Order
Circular Suprasegmental Hidden Markov Model (CSPHMM3) as a classifier. Our work
has been tested on Emotional Prosody Speech and Transcripts (EPST) database.
The extracted features of EPST database are Mel-Frequency Cepstral Coefficients
(MFCCs). Our results give average emotion recognition accuracy of 77.8% based
on the CSPHMM3. The results of this work demonstrate that CSPHMM3 is superior
to the Third-Order Hidden Markov Model (HMM3), Gaussian Mixture Model (GMM),
Support Vector Machine (SVM), and Vector Quantization (VQ) by 6.0%, 4.9%, 3.5%,
and 5.4%, respectively, for emotion recognition. The average emotion
recognition accuracy achieved based on the CSPHMM3 is comparable to that found
using subjective assessment by human judges.Comment: Accepted at The 2019 IEEE Jordan International Joint Conference on
Electrical Engineering and Information Technology (JEEIT), Jorda
Emirati-Accented Speaker Identification in each of Neutral and Shouted Talking Environments
This work is devoted to capturing Emirati-accented speech database (Arabic
United Arab Emirates database) in each of neutral and shouted talking
environments in order to study and enhance text-independent Emirati-accented
speaker identification performance in shouted environment based on each of
First-Order Circular Suprasegmental Hidden Markov Models (CSPHMM1s),
Second-Order Circular Suprasegmental Hidden Markov Models (CSPHMM2s), and
Third-Order Circular Suprasegmental Hidden Markov Models (CSPHMM3s) as
classifiers. In this research, our database was collected from fifty Emirati
native speakers (twenty five per gender) uttering eight common Emirati
sentences in each of neutral and shouted talking environments. The extracted
features of our collected database are called Mel-Frequency Cepstral
Coefficients (MFCCs). Our results show that average Emirati-accented speaker
identification performance in neutral environment is 94.0%, 95.2%, and 95.9%
based on CSPHMM1s, CSPHMM2s, and CSPHMM3s, respectively. On the other hand, the
average performance in shouted environment is 51.3%, 55.5%, and 59.3% based,
respectively, on CSPHMM1s, CSPHMM2s, and CSPHMM3s. The achieved average speaker
identification performance in shouted environment based on CSPHMM3s is very
similar to that obtained in subjective assessment by human listeners.Comment: 14 pages, 3 figures. arXiv admin note: text overlap with
arXiv:1707.0068
Emirati Speaker Verification Based on HMM1s, HMM2s, and HMM3s
This work focuses on Emirati speaker verification systems in neutral talking
environments based on each of First-Order Hidden Markov Models (HMM1s),
Second-Order Hidden Markov Models (HMM2s), and Third-Order Hidden Markov Models
(HMM3s) as classifiers. These systems have been evaluated on our collected
Emirati speech database which is comprised of 25 male and 25 female Emirati
speakers using Mel-Frequency Cepstral Coefficients (MFCCs) as extracted
features. Our results show that HMM3s outperform each of HMM1s and HMM2s for a
text-independent Emirati speaker verification. The obtained results based on
HMM3s are close to those achieved in subjective assessment by human listeners.Comment: 13th International Conference on Signal Processing, Chengdu, China,
201
- …