6 research outputs found

    Comparing Self-Supervised Pre-Training and Semi-Supervised Training for Speech Recognition in Languages with Weak Language Models

    No full text
    This paper investigates the potential of improving a hybrid automatic speech recognition model trained on 10 hours of transcribed data with 200 hours of untranscribed data in lowresource languages. First, we compare baseline methods of cross-lingual transfer with MFCC features and features extracted with the multilingual self-supervised model XLSR-53. Subsequently, we compare two approaches that can leverage the untranscribed data: semi-supervised training with LF-MMI and continued self-supervised pre-training of XLSR-53. Our results on well-resourced English broadcast data derived from MGB show that both methods achieve 18% and 27% relative improvements compared to the baseline, respectively. On the low-resource South African Soap Opera dataset, the relative improvement with semi-supervised training is only 3% due to the inherently weak language model. However, continued pretraining achieves 8.6% relative improvement because it does not rely on any external information

    GPU-Accelerated Forward-Backward Algorithm with Application to Lattic-Free MMI

    No full text
    We propose to express the forward-backward algorithm in terms of operations between sparse matrices in a specific semiring. This new perspective naturally leads to a GPU-friendly algorithm which is easy to implement in Julia or any programming languages with native support of semiring algebra. We use this new implementation to train a TDNN with the LF-MMI objective function and we compare the training time of our system with PyChain-a recently introduced C++/CUDA implementation of the LF-MMI loss. Our implementation is about two times faster while not having to use any approximation such as the "leaky-HMM"

    Multilingual Models with Language Embeddings for Low-resource Speech Recognition

    No full text
    International audienceSpeech recognition for low-resource languages remains challenging and can be addressed with techniques such as multilingual modeling and transfer learning. In this work, we explore several solutions to the multilingual training problem: training monolingual models with multilingual features, adapting a multilingual model with transfer learning and using language embeddings as additional features. To develop practical solutions we focus our work on medium size hybrid ASR models. The multilingual models are trained on 270 hours of iARPA Babel data from 25 languages, and results are reported on 4 Babel languages for the Limited Language Pack (LLP) condition. The results show that adapting a multilingual acoustic model with language embeddings is an effective solution, outperforming the baseline monolingual models, and providing comparable results to models based on state-of-the-art XLSR-53 features but with the advantage of needing 15 times fewer parameters
    corecore