5 research outputs found
Emirati Speaker Verification Based on HMM1s, HMM2s, and HMM3s
This work focuses on Emirati speaker verification systems in neutral talking
environments based on each of First-Order Hidden Markov Models (HMM1s),
Second-Order Hidden Markov Models (HMM2s), and Third-Order Hidden Markov Models
(HMM3s) as classifiers. These systems have been evaluated on our collected
Emirati speech database which is comprised of 25 male and 25 female Emirati
speakers using Mel-Frequency Cepstral Coefficients (MFCCs) as extracted
features. Our results show that HMM3s outperform each of HMM1s and HMM2s for a
text-independent Emirati speaker verification. The obtained results based on
HMM3s are close to those achieved in subjective assessment by human listeners.Comment: 13th International Conference on Signal Processing, Chengdu, China,
201
Emirati-Accented Speaker Identification in each of Neutral and Shouted Talking Environments
This work is devoted to capturing Emirati-accented speech database (Arabic
United Arab Emirates database) in each of neutral and shouted talking
environments in order to study and enhance text-independent Emirati-accented
speaker identification performance in shouted environment based on each of
First-Order Circular Suprasegmental Hidden Markov Models (CSPHMM1s),
Second-Order Circular Suprasegmental Hidden Markov Models (CSPHMM2s), and
Third-Order Circular Suprasegmental Hidden Markov Models (CSPHMM3s) as
classifiers. In this research, our database was collected from fifty Emirati
native speakers (twenty five per gender) uttering eight common Emirati
sentences in each of neutral and shouted talking environments. The extracted
features of our collected database are called Mel-Frequency Cepstral
Coefficients (MFCCs). Our results show that average Emirati-accented speaker
identification performance in neutral environment is 94.0%, 95.2%, and 95.9%
based on CSPHMM1s, CSPHMM2s, and CSPHMM3s, respectively. On the other hand, the
average performance in shouted environment is 51.3%, 55.5%, and 59.3% based,
respectively, on CSPHMM1s, CSPHMM2s, and CSPHMM3s. The achieved average speaker
identification performance in shouted environment based on CSPHMM3s is very
similar to that obtained in subjective assessment by human listeners.Comment: 14 pages, 3 figures. arXiv admin note: text overlap with
arXiv:1707.0068
Emirati-Accented Speaker Identification in Stressful Talking Conditions
This research is dedicated to improving text-independent Emirati-accented
speaker identification performance in stressful talking conditions using three
distinct classifiers: First-Order Hidden Markov Models (HMM1s), Second-Order
Hidden Markov Models (HMM2s), and Third-Order Hidden Markov Models (HMM3s). The
database that has been used in this work was collected from 25 per gender
Emirati native speakers uttering eight widespread Emirati sentences in each of
neutral, shouted, slow, loud, soft, and fast talking conditions. The extracted
features of the captured database are called Mel-Frequency Cepstral
Coefficients (MFCCs). Based on HMM1s, HMM2s, and HMM3s, average
Emirati-accented speaker identification accuracy in stressful conditions is
58.6%, 61.1%, and 65.0%, respectively. The achieved average speaker
identification accuracy in stressful conditions based on HMM3s is so similar to
that attained in subjective assessment by human listeners.Comment: 6 pages, this work has been accepted in The International Conference
on Electrical and Computing Technologies and Applications, 2019 (ICECTA 2019
Speaker Verification in Emotional Talking Environments based on Third-Order Circular Suprasegmental Hidden Markov Model
Speaker verification accuracy in emotional talking environments is not high
as it is in neutral ones. This work aims at accepting or rejecting the claimed
speaker using his/her voice in emotional environments based on the Third-Order
Circular Suprasegmental Hidden Markov Model (CSPHMM3) as a classifier. An
Emirati-accented (Arabic) speech database with Mel-Frequency Cepstral
Coefficients as the extracted features has been used to evaluate our work. Our
results demonstrate that speaker verification accuracy based on CSPHMM3 is
greater than that based on the state-of-the-art classifiers and models such as
Gaussian Mixture Model (GMM), Support Vector Machine (SVM), and Vector
Quantization (VQ).Comment: 6 pages, accepted in The International Conference on Electrical and
Computing Technologies and Applications, 2019 (ICECTA 2019). arXiv admin
note: text overlap with arXiv:1903.0980
Emotion Recognition based on Third-Order Circular Suprasegmental Hidden Markov Model
This work focuses on recognizing the unknown emotion based on the Third-Order
Circular Suprasegmental Hidden Markov Model (CSPHMM3) as a classifier. Our work
has been tested on Emotional Prosody Speech and Transcripts (EPST) database.
The extracted features of EPST database are Mel-Frequency Cepstral Coefficients
(MFCCs). Our results give average emotion recognition accuracy of 77.8% based
on the CSPHMM3. The results of this work demonstrate that CSPHMM3 is superior
to the Third-Order Hidden Markov Model (HMM3), Gaussian Mixture Model (GMM),
Support Vector Machine (SVM), and Vector Quantization (VQ) by 6.0%, 4.9%, 3.5%,
and 5.4%, respectively, for emotion recognition. The average emotion
recognition accuracy achieved based on the CSPHMM3 is comparable to that found
using subjective assessment by human judges.Comment: Accepted at The 2019 IEEE Jordan International Joint Conference on
Electrical Engineering and Information Technology (JEEIT), Jorda