Search CORE

7 research outputs found

Singing speaker clustering based on subspace learning in the GMM mean supervector space

Author: John H L Hansen
Mahnoosh Mehrabani
Publication venue
Publication date: 23/04/2020
Field of study

Abstract In this study, we propose algorithms based on subspace learning in the GMM mean supervector space to improve performance of speaker clustering with speech from both reading and singing. As a speaking style, singing introduces changes in the time-frequency structure of a speaker's voice. The purpose of this study is to introduce advancements for speech systems such as speech indexing and retrieval which improve robustness to intrinsic variations in speech production. Speaker clustering techniques such as k-means and hierarchical are explored for analysis of acoustic space differences of a corpus consisting of reading and singing of lyrics for each speaker. Furthermore, a distance based on fuzzy c-means membership degrees is proposed to more accurately measure clustering difficulty or speaker confusability. Two categories of subspace learning methods are studied: unsupervised based on LPP, and supervised based on PLDA. Our proposed clustering method based on PLDA is a two stage algorithm: where first, initial clusters are obtained using full dimension supervectors, and next, each cluster is refined in a PLDA subspace resulting in a more speaker dependent representation that is less sensitive to speaking style. It is shown that LPP improves average clustering accuracy by 5.1% absolute versus a hierarchical baseline for a mixture of reading and singing, and PLDA based clustering increases accuracy by 9.6% absolute versus a k-means baseline. The advancements offer novel techniques to improve model formulation for speech applications including speaker ID, audio search, and audio content analysis

CiteSeerX

A Hybrid Approach to Scalable and Robust Spoken Language Understanding in Enterprise Virtual Agents

Author: Mahnoosh Mehrabani
Minhua Chen
Narendra K. Gupta
Ryan Price
Shahab Jalalvand
Srinivas Bangalore
Yanjie Zhao
Yeon-Jun Kim
Publication venue
Publication date: 01/01/2021
Field of study

Spoken language understanding (SLU) extracts the intended mean- ing from a user utterance and is a critical component of conversational virtual agents. In enterprise virtual agents (EVAs), language understanding is substantially challenging. First, the users are infrequent callers who are unfamiliar with the expectations of a pre-designed conversation flow. Second, the users are paying customers of an enterprise who demand a reliable, consistent and efficient user experience when resolving their issues. In this work, we describe a general and robust framework for intent and entity extraction utilizing a hybrid of statistical and rule-based approaches. Our framework includes confidence modeling that incorporates information from all components in the SLU pipeline, a critical addition for EVAs to en- sure accuracy. Our focus is on creating accurate and scalable SLU that can be deployed rapidly for a large class of EVA applications with little need for human intervention

Open Access Repository

Automatic language analysis and identification based on speech production knowledge

Author: Abhijeet Sangwan
John H. L. Hansen
Mahnoosh Mehrabani
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

In this paper, a language analysis and classification system that lever-ages knowledge of speech production is proposed. The proposed scheme automatically extracts key production traits (or “hot-spots”) that are strongly tied to the underlying language structure. Particu-larly, the speech utterance is first parsed into consonant and vowel clusters. Subsequently, the production traits for each cluster is rep-resented by the corresponding temporal evolution of speech articu-latory states. It is hypothesized that a selection of these production traits are strongly tied to the underlying language, and can be ex-ploited for language ID. The new scheme is evaluated on our South Indian Languages (SInL) corpus which consists of 5 closely related languages spoken in India, namely, Kannada, Tamil, Telegu, Malay-alam, and Marathi. Good accuracy is achieved with a rate of 65% obtained in a difficult 5-way classification task with about 4sec of train and test speech data per utterance. Furthermore, the proposed scheme is also able to automatically identify key production traits of each language (e.g., dominant vowels, stop-consonants, fricatives etc.)

CiteSeerX

Crossref

An Approach for Speech Recognition Technique

Author: Bowen Zhou et. al
Dr. B. Eswara Reddy
K.H.Davis
Mahnoosh Mehrabani
Publication venue: 'i-manager Publications'
Publication date
Field of study

Crossref

Automatic analysis of dialect/language sets

Author: A Bradlow
A Curzan
A Faber
CW Wightman
D Talkin
J Nerbonne
J Nerbonne
John H. L. Hansen
M Walter
M Wieling
M Wieling
MA Zissman
Mahnoosh Mehrabani
P Angkititrakul
RG Shackleton
S Kotz
W Heeringa
W Heeringa
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref