Search CORE

107 research outputs found

Speaker Identification in Shouted Talking Environments Based on Novel Third-Order Hidden Markov Models

Author: Shahin Ismail
Publication venue
Publication date: 01/07/2017
Field of study

In this work we propose, implement, and evaluate novel models called Third-Order Hidden Markov Models (HMM3s) to enhance low performance of text-independent speaker identification in shouted talking environments. The proposed models have been tested on our collected speech database using Mel-Frequency Cepstral Coefficients (MFCCs). Our results demonstrate that HMM3s significantly improve speaker identification performance in such talking environments by 11.3% and 166.7% compared to second-order hidden Markov models (HMM2s) and first-order hidden Markov models (HMM1s), respectively. The achieved results based on the proposed models are close to those obtained in subjective assessment by human listeners.Comment: The 4th International Conference on Audio, Language and Image Processing (ICALIP2014), Shanghai, China, 201

arXiv.org e-Print Archive

Speaker Identification in a Shouted Talking Environment Based on Novel Third-Order Circular Suprasegmental Hidden Markov Models

Author: Shahin Ismail
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/07/2017
Field of study

It is well known that speaker identification yields very high performance in a neutral talking environment, on the other hand, the performance has been sharply declined in a shouted talking environment. This work aims at proposing, implementing, and evaluating novel Third-Order Circular Suprasegmental Hidden Markov Models (CSPHMM3s) to improve the low performance of text-independent speaker identification in a shouted talking environment. CSPHMM3s possess combined characteristics of: Circular Hidden Markov Models (CHMMs), Third-Order Hidden Markov Models (HMM3s), and Suprasegmental Hidden Markov Models (SPHMMs). Our results show that CSPHMM3s are superior to each of: First-Order Left-to-Right Suprasegmental Hidden Markov Models (LTRSPHMM1s), Second-Order Left-to-Right Suprasegmental Hidden Markov Models (LTRSPHMM2s), Third-Order Left-to-Right Suprasegmental Hidden Markov Models (LTRSPHMM3s), First-Order Circular Suprasegmental Hidden Markov Models (CSPHMM1s), and Second-Order Circular Suprasegmental Hidden Markov Models (CSPHMM2s) in a shouted talking environment. Using our collected speech database, average speaker identification performance in a shouted talking environment based on LTRSPHMM1s, LTRSPHMM2s, LTRSPHMM3s, CSPHMM1s, CSPHMM2s, and CSPHMM3s is 74.6%, 78.4%, 81.7%, 78.7%, 83.4%, and 85.8%, respectively. Speaker identification performance that has been achieved based on CSPHMM3s is close to that attained based on subjective assessment by human listeners.Comment: arXiv admin note: substantial text overlap with arXiv:1706.09722, arXiv:1707.0013

arXiv.org e-Print Archive

Emirati-Accented Speaker Identification in each of Neutral and Shouted Talking Environments

Author: Bahutair Mohammed
Nassif Ali Bou
Shahin Ismail
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 31/03/2018
Field of study

This work is devoted to capturing Emirati-accented speech database (Arabic United Arab Emirates database) in each of neutral and shouted talking environments in order to study and enhance text-independent Emirati-accented speaker identification performance in shouted environment based on each of First-Order Circular Suprasegmental Hidden Markov Models (CSPHMM1s), Second-Order Circular Suprasegmental Hidden Markov Models (CSPHMM2s), and Third-Order Circular Suprasegmental Hidden Markov Models (CSPHMM3s) as classifiers. In this research, our database was collected from fifty Emirati native speakers (twenty five per gender) uttering eight common Emirati sentences in each of neutral and shouted talking environments. The extracted features of our collected database are called Mel-Frequency Cepstral Coefficients (MFCCs). Our results show that average Emirati-accented speaker identification performance in neutral environment is 94.0%, 95.2%, and 95.9% based on CSPHMM1s, CSPHMM2s, and CSPHMM3s, respectively. On the other hand, the average performance in shouted environment is 51.3%, 55.5%, and 59.3% based, respectively, on CSPHMM1s, CSPHMM2s, and CSPHMM3s. The achieved average speaker identification performance in shouted environment based on CSPHMM3s is very similar to that obtained in subjective assessment by human listeners.Comment: 14 pages, 3 figures. arXiv admin note: text overlap with arXiv:1707.0068

arXiv.org e-Print Archive

Emotion Recognition based on Third-Order Circular Suprasegmental Hidden Markov Model

Author: Shahin Ismail
Publication venue
Publication date: 23/03/2019
Field of study

This work focuses on recognizing the unknown emotion based on the Third-Order Circular Suprasegmental Hidden Markov Model (CSPHMM3) as a classifier. Our work has been tested on Emotional Prosody Speech and Transcripts (EPST) database. The extracted features of EPST database are Mel-Frequency Cepstral Coefficients (MFCCs). Our results give average emotion recognition accuracy of 77.8% based on the CSPHMM3. The results of this work demonstrate that CSPHMM3 is superior to the Third-Order Hidden Markov Model (HMM3), Gaussian Mixture Model (GMM), Support Vector Machine (SVM), and Vector Quantization (VQ) by 6.0%, 4.9%, 3.5%, and 5.4%, respectively, for emotion recognition. The average emotion recognition accuracy achieved based on the CSPHMM3 is comparable to that found using subjective assessment by human judges.Comment: Accepted at The 2019 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT), Jorda

arXiv.org e-Print Archive

Emirati Speaker Verification Based on HMM1s, HMM2s, and HMM3s

Author: Shahin Ismail
Publication venue
Publication date: 02/07/2017
Field of study

This work focuses on Emirati speaker verification systems in neutral talking environments based on each of First-Order Hidden Markov Models (HMM1s), Second-Order Hidden Markov Models (HMM2s), and Third-Order Hidden Markov Models (HMM3s) as classifiers. These systems have been evaluated on our collected Emirati speech database which is comprised of 25 male and 25 female Emirati speakers using Mel-Frequency Cepstral Coefficients (MFCCs) as extracted features. Our results show that HMM3s outperform each of HMM1s and HMM2s for a text-independent Emirati speaker verification. The obtained results based on HMM3s are close to those achieved in subjective assessment by human listeners.Comment: 13th International Conference on Signal Processing, Chengdu, China, 201

arXiv.org e-Print Archive

Emirati-Accented Speaker Identification in Stressful Talking Conditions

Author: Nassif Ali Bou
Shahin Ismail
Publication venue
Publication date: 29/10/2019
Field of study

This research is dedicated to improving text-independent Emirati-accented speaker identification performance in stressful talking conditions using three distinct classifiers: First-Order Hidden Markov Models (HMM1s), Second-Order Hidden Markov Models (HMM2s), and Third-Order Hidden Markov Models (HMM3s). The database that has been used in this work was collected from 25 per gender Emirati native speakers uttering eight widespread Emirati sentences in each of neutral, shouted, slow, loud, soft, and fast talking conditions. The extracted features of the captured database are called Mel-Frequency Cepstral Coefficients (MFCCs). Based on HMM1s, HMM2s, and HMM3s, average Emirati-accented speaker identification accuracy in stressful conditions is 58.6%, 61.1%, and 65.0%, respectively. The achieved average speaker identification accuracy in stressful conditions based on HMM3s is so similar to that attained in subjective assessment by human listeners.Comment: 6 pages, this work has been accepted in The International Conference on Electrical and Computing Technologies and Applications, 2019 (ICECTA 2019

arXiv.org e-Print Archive

Speaker Verification in Emotional Talking Environments based on Third-Order Circular Suprasegmental Hidden Markov Model

Author: Nassif Ali Bou
Shahin Ismail
Publication venue
Publication date: 29/10/2019
Field of study

Speaker verification accuracy in emotional talking environments is not high as it is in neutral ones. This work aims at accepting or rejecting the claimed speaker using his/her voice in emotional environments based on the Third-Order Circular Suprasegmental Hidden Markov Model (CSPHMM3) as a classifier. An Emirati-accented (Arabic) speech database with Mel-Frequency Cepstral Coefficients as the extracted features has been used to evaluate our work. Our results demonstrate that speaker verification accuracy based on CSPHMM3 is greater than that based on the state-of-the-art classifiers and models such as Gaussian Mixture Model (GMM), Support Vector Machine (SVM), and Vector Quantization (VQ).Comment: 6 pages, accepted in The International Conference on Electrical and Computing Technologies and Applications, 2019 (ICECTA 2019). arXiv admin note: text overlap with arXiv:1903.0980

arXiv.org e-Print Archive

Three-Stage Speaker Verification Architecture in Emotional Talking Environments

Author: Nassif Ali Bou
Shahin Ismail
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 03/09/2018
Field of study

Speaker verification performance in neutral talking environment is usually high, while it is sharply decreased in emotional talking environments. This performance degradation in emotional environments is due to the problem of mismatch between training in neutral environment while testing in emotional environments. In this work, a three-stage speaker verification architecture has been proposed to enhance speaker verification performance in emotional environments. This architecture is comprised of three cascaded stages: gender identification stage followed by an emotion identification stage followed by a speaker verification stage. The proposed framework has been evaluated on two distinct and independent emotional speech datasets: in-house dataset and Emotional Prosody Speech and Transcripts dataset. Our results show that speaker verification based on both gender information and emotion information is superior to each of speaker verification based on gender information only, emotion information only, and neither gender information nor emotion information. The attained average speaker verification performance based on the proposed framework is very alike to that attained in subjective assessment by human listeners.Comment: 18 pages. arXiv admin note: substantial text overlap with arXiv:1804.00155, arXiv:1707.0013

arXiv.org e-Print Archive

Novel Cascaded Gaussian Mixture Model-Deep Neural Network Classifier for Speaker Identification in Emotional Talking Environments

Author: Hamsa Shibani
Nassif Ali Bou
Shahin Ismail
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/10/2018
Field of study

This research is an effort to present an effective approach to enhance text-independent speaker identification performance in emotional talking environments based on novel classifier called cascaded Gaussian Mixture Model-Deep Neural Network (GMM-DNN). Our current work focuses on proposing, implementing and evaluating a new approach for speaker identification in emotional talking environments based on cascaded Gaussian Mixture Model-Deep Neural Network as a classifier. The results point out that the cascaded GMM-DNN classifier improves speaker identification performance at various emotions using two distinct speech databases: Emirati speech database (Arabic United Arab Emirates dataset) and Speech Under Simulated and Actual Stress (SUSAS) English dataset. The proposed classifier outperforms classical classifiers such as Multilayer Perceptron (MLP) and Support Vector Machine (SVM) in each dataset. Speaker identification performance that has been attained based on the cascaded GMM-DNN is similar to that acquired from subjective assessment by human listeners.Comment: 15 page

arXiv.org e-Print Archive

Emotion Recognition Using Speaker Cues

Author: Shahin Ismail
Publication venue
Publication date: 04/02/2020
Field of study

This research aims at identifying the unknown emotion using speaker cues. In this study, we identify the unknown emotion using a two-stage framework. The first stage focuses on identifying the speaker who uttered the unknown emotion, while the next stage focuses on identifying the unknown emotion uttered by the recognized speaker in the prior stage. This proposed framework has been evaluated on an Arabic Emirati-accented speech database uttered by fifteen speakers per gender. Mel-Frequency Cepstral Coefficients (MFCCs) have been used as the extracted features and Hidden Markov Model (HMM) has been utilized as the classifier in this work. Our findings demonstrate that emotion recognition accuracy based on the two-stage framework is greater than that based on the one-stage approach and the state-of-the-art classifiers and models such as Gaussian Mixture Model (GMM), Support Vector Machine (SVM), and Vector Quantization (VQ). The average emotion recognition accuracy based on the two-stage approach is 67.5%, while the accuracy reaches to 61.4%, 63.3%, 64.5%, and 61.5%, based on the one-stage approach, GMM, SVM, and VQ, respectively. The achieved results based on the two-stage framework are very close to those attained in subjective assessment by human listeners.Comment: 5 page

arXiv.org e-Print Archive