6 research outputs found
On the Effectiveness of Neural Text Generation based Data Augmentation for Recognition of Morphologically Rich Speech
Advanced neural network models have penetrated Automatic Speech Recognition
(ASR) in recent years, however, in language modeling many systems still rely on
traditional Back-off N-gram Language Models (BNLM) partly or entirely. The
reason for this are the high cost and complexity of training and using neural
language models, mostly possible by adding a second decoding pass (rescoring).
In our recent work we have significantly improved the online performance of a
conversational speech transcription system by transferring knowledge from a
Recurrent Neural Network Language Model (RNNLM) to the single pass BNLM with
text generation based data augmentation. In the present paper we analyze the
amount of transferable knowledge and demonstrate that the neural augmented LM
(RNN-BNLM) can help to capture almost 50% of the knowledge of the RNNLM yet by
dropping the second decoding pass and making the system real-time capable. We
also systematically compare word and subword LMs and show that subword-based
neural text augmentation can be especially beneficial in under-resourced
conditions. In addition, we show that using the RNN-BNLM in the first pass
followed by a neural second pass, offline ASR results can be even significantly
improved.Comment: 8 pages, 2 figures, accepted for publication at TSD 202
Investigation on N-gram Approximated RNNLMs for Recognition of Morphologically Rich Speech
Recognition of Hungarian conversational telephone speech is challenging due
to the informal style and morphological richness of the language. Recurrent
Neural Network Language Model (RNNLM) can provide remedy for the high
perplexity of the task; however, two-pass decoding introduces a considerable
processing delay. In order to eliminate this delay we investigate approaches
aiming at the complexity reduction of RNNLM, while preserving its accuracy. We
compare the performance of conventional back-off n-gram language models (BNLM),
BNLM approximation of RNNLMs (RNN-BNLM) and RNN n-grams in terms of perplexity
and word error rate (WER). Morphological richness is often addressed by using
statistically derived subwords - morphs - in the language models, hence our
investigations are extended to morph-based models, as well. We found that using
RNN-BNLMs 40% of the RNNLM perplexity reduction can be recovered, which is
roughly equal to the performance of a RNN 4-gram model. Combining morph-based
modeling and approximation of RNNLM, we were able to achieve 8% relative WER
reduction and preserve real-time operation of our conversational telephone
speech recognition system.Comment: 12 pages, 2 figures, accepted for publication at SLSP 201
Modularity and Neural Integration in Large-Vocabulary Continuous Speech Recognition
This Thesis tackles the problems of modularity in Large-Vocabulary Continuous Speech Recognition with use of Neural Network