Large-scale random forest language models for speech recognition

Frederick Jelinek; Sanjeev Khudanpur; Yi Su

research

Large-scale random forest language models for speech recognition

Authors: Frederick Jelinek
Sanjeev Khudanpur
Yi Su
Publication date
Publisher

Abstract

The random forest language model (RFLM) has shown encouraging results in several automatic speech recognition (ASR) tasks but has been hindered by practical limitations, notably the space-complexity of RFLM estimation from large amounts of data. This paper addresses large-scale training and testing of the RFLM via an efficient disk-swapping strategy that exploits the recursive structure of a binary decision tree and the local access property of the tree-growing algorithm, redeeming the full potential of the RFLM, and opening avenues of further research, including useful comparisons with n-gram models. Benefits of this strategy are demonstrated by perplexity reduction and lattice rescoring experiments using a state-of-the-art ASR system. Index Terms: random forest language model, large-scale training, data scaling, speech recognitio

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

CiteSeerX

oai:CiteSeerX.psu:10.1.1.433.8...

Last time updated on 22/10/2014