2 research outputs found
Investigation on N-gram Approximated RNNLMs for Recognition of Morphologically Rich Speech
Recognition of Hungarian conversational telephone speech is challenging due
to the informal style and morphological richness of the language. Recurrent
Neural Network Language Model (RNNLM) can provide remedy for the high
perplexity of the task; however, two-pass decoding introduces a considerable
processing delay. In order to eliminate this delay we investigate approaches
aiming at the complexity reduction of RNNLM, while preserving its accuracy. We
compare the performance of conventional back-off n-gram language models (BNLM),
BNLM approximation of RNNLMs (RNN-BNLM) and RNN n-grams in terms of perplexity
and word error rate (WER). Morphological richness is often addressed by using
statistically derived subwords - morphs - in the language models, hence our
investigations are extended to morph-based models, as well. We found that using
RNN-BNLMs 40% of the RNNLM perplexity reduction can be recovered, which is
roughly equal to the performance of a RNN 4-gram model. Combining morph-based
modeling and approximation of RNNLM, we were able to achieve 8% relative WER
reduction and preserve real-time operation of our conversational telephone
speech recognition system.Comment: 12 pages, 2 figures, accepted for publication at SLSP 201