Search CORE

8 research outputs found

External Language Model Integration for Factorized Neural Transducers

Author: Aksoylar Cem
Chang Shuangyu
Levit Michael
Parthasarathy Sarangarajan
Rasooli Mohammad Sadegh
Publication venue
Publication date: 26/05/2023
Field of study

We propose an adaptation method for factorized neural transducers (FNT) with external language models. We demonstrate that both neural and n-gram external LMs add significantly more value when linearly interpolated with predictor output compared to shallow fusion, thus confirming that FNT forces the predictor to act like regular language models. Further, we propose a method to integrate class-based n-gram language models into FNT framework resulting in accuracy gains similar to a hybrid setup. We show average gains of 18% WERR with lexical adaptation across various scenarios and additive gains of up to 60% WERR in one entity-rich scenario through a combination of class-based n-gram and neural LMs

arXiv.org e-Print Archive

Developing RNN-T Models Surpassing High-Performance Hybrid Models with Customization Capability

Author: Gong Yifan
He Lei
Li Jinyu
Liu Yanqing
Mazalov Vadim
Meng Zhong
Parthasarathy Sarangarajan
Wang Zhenghao
Wei Wenning
Zhao Rui
Zhao Sheng
Publication venue
Publication date: 29/07/2020
Field of study

Because of its streaming nature, recurrent neural network transducer (RNN-T) is a very promising end-to-end (E2E) model that may replace the popular hybrid model for automatic speech recognition. In this paper, we describe our recent development of RNN-T models with reduced GPU memory consumption during training, better initialization strategy, and advanced encoder modeling with future lookahead. When trained with Microsoft's 65 thousand hours of anonymized training data, the developed RNN-T model surpasses a very well trained hybrid model with both better recognition accuracy and lower latency. We further study how to customize RNN-T models to a new domain, which is important for deploying E2E models to practical scenarios. By comparing several methods leveraging text-only data in the new domain, we found that updating RNN-T's prediction and joint networks using text-to-speech generated from domain-specific text is the most effective.Comment: Accepted by Interspeech 202

arXiv.org e-Print Archive

Crossref

Word-Phrase-Entity Language Models: Getting More Mileage out of N-grams

Author: Andreas Stolcke
Benoît Dumoulin
Michael Levit
Sarangarajan Parthasarathy
Shuangyu Chang
Publication venue
Publication date: 05/03/2020
Field of study

Abstract We present a modification of the traditional n-gram language modeling approach that departs from the word-level data representation and seeks to re-express the training text in terms of tokens that could be either words, common phrases or instances of one or several classes. Our iterative optimization algorithm considers alternative parses of the corpus in terms of these tokens, re-estimates token n-gram probabilities and also updates within-class distributions. In this paper, we focus on the cold start approach that only assumes availability of the word-level training corpus, as well as a number of generic class definitions. Applied to the calendar scenario in the personal assistant domain, our approach reduces word error rates by more than 13% relative to the word-only n-gram language models. Only a small fraction of these improvements can be ascribed to a larger vocabulary

CiteSeerX