1 research outputs found
Three-Stage Speaker Verification Architecture in Emotional Talking Environments
Speaker verification performance in neutral talking environment is usually
high, while it is sharply decreased in emotional talking environments. This
performance degradation in emotional environments is due to the problem of
mismatch between training in neutral environment while testing in emotional
environments. In this work, a three-stage speaker verification architecture has
been proposed to enhance speaker verification performance in emotional
environments. This architecture is comprised of three cascaded stages: gender
identification stage followed by an emotion identification stage followed by a
speaker verification stage. The proposed framework has been evaluated on two
distinct and independent emotional speech datasets: in-house dataset and
Emotional Prosody Speech and Transcripts dataset. Our results show that speaker
verification based on both gender information and emotion information is
superior to each of speaker verification based on gender information only,
emotion information only, and neither gender information nor emotion
information. The attained average speaker verification performance based on the
proposed framework is very alike to that attained in subjective assessment by
human listeners.Comment: 18 pages. arXiv admin note: substantial text overlap with
arXiv:1804.00155, arXiv:1707.0013