Search CORE

2,795 research outputs found

IMAGINE Final Report

Author: Arana C
Dattani I
Pick R
Recio I
Schmidt P
Publication venue: s.n.
Publication date: 01/09/2003
Field of study

Factor analysis modelling for speaker verification with short utterances

Author: Lustri Christopher
Sridharan Sridha
Vogt Robert
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

This paper examines combining both relevance MAP and subspace speaker adaptation processes to train GMM speaker models for use in speaker verification systems with a particular focus on short utterance lengths. The subspace speaker adaptation method involves developing a speaker GMM mean supervector as the sum of a speaker-independent prior distribution and a speaker dependent offset constrained to lie within a low-rank subspace, and has been shown to provide improvements in accuracy over ordinary relevance MAP when the amount of training data is limited. It is shown through testing on NIST SRE data that combining the two processes provides speaker models which lead to modest improvements in verification accuracy for limited data situations, in addition to improving the performance of the speaker verification system when a larger amount of available training data is available

Queensland University of Technology ePrints Archive

Effectiveness of Single-Channel BLSTM Enhancement for Language Identification

Author: Dehak Najim
Sibbern Frederiksen Peter
Tan Zheng-Hua
Villalba Jesus
Watanabe Shinji
Publication venue: 'International Speech Communication Association'
Publication date: 01/09/2018
Field of study

Crossref

VBN

Arabic Dialectical Speech Recognition in Mobile Communication Services

Author: Imed Zitouni
Qiru Zhou
Publication venue: 'IntechOpen'
Publication date: 01/11/2008
Field of study

IntechOpen

Crossref

The Microsoft 2017 Conversational Speech Recognition System

Author: Alleva F.
Droppo J.
Huang X.
Stolcke A.
Wu L.
Xiong W.
Publication venue
Publication date: 24/08/2017
Field of study

We describe the 2017 version of Microsoft's conversational speech recognition system, in which we update our 2016 system with recent developments in neural-network-based acoustic and language modeling to further advance the state of the art on the Switchboard speech recognition task. The system adds a CNN-BLSTM acoustic model to the set of model architectures we combined previously, and includes character-based and dialog session aware LSTM language models in rescoring. For system combination we adopt a two-stage approach, whereby subsets of acoustic models are first combined at the senone/frame level, followed by a word-level voting via confusion networks. We also added a confusion network rescoring step after system combination. The resulting system yields a 5.1\% word error rate on the 2000 Switchboard evaluation set

arXiv.org e-Print Archive

Crossref