3 research outputs found
Emirati-Accented Speaker Identification in each of Neutral and Shouted Talking Environments
This work is devoted to capturing Emirati-accented speech database (Arabic
United Arab Emirates database) in each of neutral and shouted talking
environments in order to study and enhance text-independent Emirati-accented
speaker identification performance in shouted environment based on each of
First-Order Circular Suprasegmental Hidden Markov Models (CSPHMM1s),
Second-Order Circular Suprasegmental Hidden Markov Models (CSPHMM2s), and
Third-Order Circular Suprasegmental Hidden Markov Models (CSPHMM3s) as
classifiers. In this research, our database was collected from fifty Emirati
native speakers (twenty five per gender) uttering eight common Emirati
sentences in each of neutral and shouted talking environments. The extracted
features of our collected database are called Mel-Frequency Cepstral
Coefficients (MFCCs). Our results show that average Emirati-accented speaker
identification performance in neutral environment is 94.0%, 95.2%, and 95.9%
based on CSPHMM1s, CSPHMM2s, and CSPHMM3s, respectively. On the other hand, the
average performance in shouted environment is 51.3%, 55.5%, and 59.3% based,
respectively, on CSPHMM1s, CSPHMM2s, and CSPHMM3s. The achieved average speaker
identification performance in shouted environment based on CSPHMM3s is very
similar to that obtained in subjective assessment by human listeners.Comment: 14 pages, 3 figures. arXiv admin note: text overlap with
arXiv:1707.0068
Emirati-Accented Speaker Identification in Stressful Talking Conditions
This research is dedicated to improving text-independent Emirati-accented
speaker identification performance in stressful talking conditions using three
distinct classifiers: First-Order Hidden Markov Models (HMM1s), Second-Order
Hidden Markov Models (HMM2s), and Third-Order Hidden Markov Models (HMM3s). The
database that has been used in this work was collected from 25 per gender
Emirati native speakers uttering eight widespread Emirati sentences in each of
neutral, shouted, slow, loud, soft, and fast talking conditions. The extracted
features of the captured database are called Mel-Frequency Cepstral
Coefficients (MFCCs). Based on HMM1s, HMM2s, and HMM3s, average
Emirati-accented speaker identification accuracy in stressful conditions is
58.6%, 61.1%, and 65.0%, respectively. The achieved average speaker
identification accuracy in stressful conditions based on HMM3s is so similar to
that attained in subjective assessment by human listeners.Comment: 6 pages, this work has been accepted in The International Conference
on Electrical and Computing Technologies and Applications, 2019 (ICECTA 2019
Speaker Verification in Emotional Talking Environments based on Third-Order Circular Suprasegmental Hidden Markov Model
Speaker verification accuracy in emotional talking environments is not high
as it is in neutral ones. This work aims at accepting or rejecting the claimed
speaker using his/her voice in emotional environments based on the Third-Order
Circular Suprasegmental Hidden Markov Model (CSPHMM3) as a classifier. An
Emirati-accented (Arabic) speech database with Mel-Frequency Cepstral
Coefficients as the extracted features has been used to evaluate our work. Our
results demonstrate that speaker verification accuracy based on CSPHMM3 is
greater than that based on the state-of-the-art classifiers and models such as
Gaussian Mixture Model (GMM), Support Vector Machine (SVM), and Vector
Quantization (VQ).Comment: 6 pages, accepted in The International Conference on Electrical and
Computing Technologies and Applications, 2019 (ICECTA 2019). arXiv admin
note: text overlap with arXiv:1903.0980