107 research outputs found
Speaker Identification in Shouted Talking Environments Based on Novel Third-Order Hidden Markov Models
In this work we propose, implement, and evaluate novel models called
Third-Order Hidden Markov Models (HMM3s) to enhance low performance of
text-independent speaker identification in shouted talking environments. The
proposed models have been tested on our collected speech database using
Mel-Frequency Cepstral Coefficients (MFCCs). Our results demonstrate that HMM3s
significantly improve speaker identification performance in such talking
environments by 11.3% and 166.7% compared to second-order hidden Markov models
(HMM2s) and first-order hidden Markov models (HMM1s), respectively. The
achieved results based on the proposed models are close to those obtained in
subjective assessment by human listeners.Comment: The 4th International Conference on Audio, Language and Image
Processing (ICALIP2014), Shanghai, China, 201
Speaker Identification in a Shouted Talking Environment Based on Novel Third-Order Circular Suprasegmental Hidden Markov Models
It is well known that speaker identification yields very high performance in
a neutral talking environment, on the other hand, the performance has been
sharply declined in a shouted talking environment. This work aims at proposing,
implementing, and evaluating novel Third-Order Circular Suprasegmental Hidden
Markov Models (CSPHMM3s) to improve the low performance of text-independent
speaker identification in a shouted talking environment. CSPHMM3s possess
combined characteristics of: Circular Hidden Markov Models (CHMMs), Third-Order
Hidden Markov Models (HMM3s), and Suprasegmental Hidden Markov Models (SPHMMs).
Our results show that CSPHMM3s are superior to each of: First-Order
Left-to-Right Suprasegmental Hidden Markov Models (LTRSPHMM1s), Second-Order
Left-to-Right Suprasegmental Hidden Markov Models (LTRSPHMM2s), Third-Order
Left-to-Right Suprasegmental Hidden Markov Models (LTRSPHMM3s), First-Order
Circular Suprasegmental Hidden Markov Models (CSPHMM1s), and Second-Order
Circular Suprasegmental Hidden Markov Models (CSPHMM2s) in a shouted talking
environment. Using our collected speech database, average speaker
identification performance in a shouted talking environment based on
LTRSPHMM1s, LTRSPHMM2s, LTRSPHMM3s, CSPHMM1s, CSPHMM2s, and CSPHMM3s is 74.6%,
78.4%, 81.7%, 78.7%, 83.4%, and 85.8%, respectively. Speaker identification
performance that has been achieved based on CSPHMM3s is close to that attained
based on subjective assessment by human listeners.Comment: arXiv admin note: substantial text overlap with arXiv:1706.09722,
arXiv:1707.0013
Emirati-Accented Speaker Identification in each of Neutral and Shouted Talking Environments
This work is devoted to capturing Emirati-accented speech database (Arabic
United Arab Emirates database) in each of neutral and shouted talking
environments in order to study and enhance text-independent Emirati-accented
speaker identification performance in shouted environment based on each of
First-Order Circular Suprasegmental Hidden Markov Models (CSPHMM1s),
Second-Order Circular Suprasegmental Hidden Markov Models (CSPHMM2s), and
Third-Order Circular Suprasegmental Hidden Markov Models (CSPHMM3s) as
classifiers. In this research, our database was collected from fifty Emirati
native speakers (twenty five per gender) uttering eight common Emirati
sentences in each of neutral and shouted talking environments. The extracted
features of our collected database are called Mel-Frequency Cepstral
Coefficients (MFCCs). Our results show that average Emirati-accented speaker
identification performance in neutral environment is 94.0%, 95.2%, and 95.9%
based on CSPHMM1s, CSPHMM2s, and CSPHMM3s, respectively. On the other hand, the
average performance in shouted environment is 51.3%, 55.5%, and 59.3% based,
respectively, on CSPHMM1s, CSPHMM2s, and CSPHMM3s. The achieved average speaker
identification performance in shouted environment based on CSPHMM3s is very
similar to that obtained in subjective assessment by human listeners.Comment: 14 pages, 3 figures. arXiv admin note: text overlap with
arXiv:1707.0068
Emotion Recognition based on Third-Order Circular Suprasegmental Hidden Markov Model
This work focuses on recognizing the unknown emotion based on the Third-Order
Circular Suprasegmental Hidden Markov Model (CSPHMM3) as a classifier. Our work
has been tested on Emotional Prosody Speech and Transcripts (EPST) database.
The extracted features of EPST database are Mel-Frequency Cepstral Coefficients
(MFCCs). Our results give average emotion recognition accuracy of 77.8% based
on the CSPHMM3. The results of this work demonstrate that CSPHMM3 is superior
to the Third-Order Hidden Markov Model (HMM3), Gaussian Mixture Model (GMM),
Support Vector Machine (SVM), and Vector Quantization (VQ) by 6.0%, 4.9%, 3.5%,
and 5.4%, respectively, for emotion recognition. The average emotion
recognition accuracy achieved based on the CSPHMM3 is comparable to that found
using subjective assessment by human judges.Comment: Accepted at The 2019 IEEE Jordan International Joint Conference on
Electrical Engineering and Information Technology (JEEIT), Jorda
Emirati Speaker Verification Based on HMM1s, HMM2s, and HMM3s
This work focuses on Emirati speaker verification systems in neutral talking
environments based on each of First-Order Hidden Markov Models (HMM1s),
Second-Order Hidden Markov Models (HMM2s), and Third-Order Hidden Markov Models
(HMM3s) as classifiers. These systems have been evaluated on our collected
Emirati speech database which is comprised of 25 male and 25 female Emirati
speakers using Mel-Frequency Cepstral Coefficients (MFCCs) as extracted
features. Our results show that HMM3s outperform each of HMM1s and HMM2s for a
text-independent Emirati speaker verification. The obtained results based on
HMM3s are close to those achieved in subjective assessment by human listeners.Comment: 13th International Conference on Signal Processing, Chengdu, China,
201
Emirati-Accented Speaker Identification in Stressful Talking Conditions
This research is dedicated to improving text-independent Emirati-accented
speaker identification performance in stressful talking conditions using three
distinct classifiers: First-Order Hidden Markov Models (HMM1s), Second-Order
Hidden Markov Models (HMM2s), and Third-Order Hidden Markov Models (HMM3s). The
database that has been used in this work was collected from 25 per gender
Emirati native speakers uttering eight widespread Emirati sentences in each of
neutral, shouted, slow, loud, soft, and fast talking conditions. The extracted
features of the captured database are called Mel-Frequency Cepstral
Coefficients (MFCCs). Based on HMM1s, HMM2s, and HMM3s, average
Emirati-accented speaker identification accuracy in stressful conditions is
58.6%, 61.1%, and 65.0%, respectively. The achieved average speaker
identification accuracy in stressful conditions based on HMM3s is so similar to
that attained in subjective assessment by human listeners.Comment: 6 pages, this work has been accepted in The International Conference
on Electrical and Computing Technologies and Applications, 2019 (ICECTA 2019
Speaker Verification in Emotional Talking Environments based on Third-Order Circular Suprasegmental Hidden Markov Model
Speaker verification accuracy in emotional talking environments is not high
as it is in neutral ones. This work aims at accepting or rejecting the claimed
speaker using his/her voice in emotional environments based on the Third-Order
Circular Suprasegmental Hidden Markov Model (CSPHMM3) as a classifier. An
Emirati-accented (Arabic) speech database with Mel-Frequency Cepstral
Coefficients as the extracted features has been used to evaluate our work. Our
results demonstrate that speaker verification accuracy based on CSPHMM3 is
greater than that based on the state-of-the-art classifiers and models such as
Gaussian Mixture Model (GMM), Support Vector Machine (SVM), and Vector
Quantization (VQ).Comment: 6 pages, accepted in The International Conference on Electrical and
Computing Technologies and Applications, 2019 (ICECTA 2019). arXiv admin
note: text overlap with arXiv:1903.0980
Three-Stage Speaker Verification Architecture in Emotional Talking Environments
Speaker verification performance in neutral talking environment is usually
high, while it is sharply decreased in emotional talking environments. This
performance degradation in emotional environments is due to the problem of
mismatch between training in neutral environment while testing in emotional
environments. In this work, a three-stage speaker verification architecture has
been proposed to enhance speaker verification performance in emotional
environments. This architecture is comprised of three cascaded stages: gender
identification stage followed by an emotion identification stage followed by a
speaker verification stage. The proposed framework has been evaluated on two
distinct and independent emotional speech datasets: in-house dataset and
Emotional Prosody Speech and Transcripts dataset. Our results show that speaker
verification based on both gender information and emotion information is
superior to each of speaker verification based on gender information only,
emotion information only, and neither gender information nor emotion
information. The attained average speaker verification performance based on the
proposed framework is very alike to that attained in subjective assessment by
human listeners.Comment: 18 pages. arXiv admin note: substantial text overlap with
arXiv:1804.00155, arXiv:1707.0013
Novel Cascaded Gaussian Mixture Model-Deep Neural Network Classifier for Speaker Identification in Emotional Talking Environments
This research is an effort to present an effective approach to enhance
text-independent speaker identification performance in emotional talking
environments based on novel classifier called cascaded Gaussian Mixture
Model-Deep Neural Network (GMM-DNN). Our current work focuses on proposing,
implementing and evaluating a new approach for speaker identification in
emotional talking environments based on cascaded Gaussian Mixture Model-Deep
Neural Network as a classifier. The results point out that the cascaded GMM-DNN
classifier improves speaker identification performance at various emotions
using two distinct speech databases: Emirati speech database (Arabic United
Arab Emirates dataset) and Speech Under Simulated and Actual Stress (SUSAS)
English dataset. The proposed classifier outperforms classical classifiers such
as Multilayer Perceptron (MLP) and Support Vector Machine (SVM) in each
dataset. Speaker identification performance that has been attained based on the
cascaded GMM-DNN is similar to that acquired from subjective assessment by
human listeners.Comment: 15 page
Emotion Recognition Using Speaker Cues
This research aims at identifying the unknown emotion using speaker cues. In
this study, we identify the unknown emotion using a two-stage framework. The
first stage focuses on identifying the speaker who uttered the unknown emotion,
while the next stage focuses on identifying the unknown emotion uttered by the
recognized speaker in the prior stage. This proposed framework has been
evaluated on an Arabic Emirati-accented speech database uttered by fifteen
speakers per gender. Mel-Frequency Cepstral Coefficients (MFCCs) have been used
as the extracted features and Hidden Markov Model (HMM) has been utilized as
the classifier in this work. Our findings demonstrate that emotion recognition
accuracy based on the two-stage framework is greater than that based on the
one-stage approach and the state-of-the-art classifiers and models such as
Gaussian Mixture Model (GMM), Support Vector Machine (SVM), and Vector
Quantization (VQ). The average emotion recognition accuracy based on the
two-stage approach is 67.5%, while the accuracy reaches to 61.4%, 63.3%, 64.5%,
and 61.5%, based on the one-stage approach, GMM, SVM, and VQ, respectively. The
achieved results based on the two-stage framework are very close to those
attained in subjective assessment by human listeners.Comment: 5 page
- …