2,172 research outputs found

    Damage Control During Domain Adaptation for Transducer Based Automatic Speech Recognition

    Full text link
    Automatic speech recognition models are often adapted to improve their accuracy in a new domain. A potential drawback of model adaptation to new domains is catastrophic forgetting, where the Word Error Rate on the original domain is significantly degraded. This paper addresses the situation when we want to simultaneously adapt automatic speech recognition models to a new domain and limit the degradation of accuracy on the original domain without access to the original training dataset. We propose several techniques such as a limited training strategy and regularized adapter modules for the Transducer encoder, prediction, and joiner network. We apply these methods to the Google Speech Commands and to the UK and Ireland English Dialect speech data set and obtain strong results on the new target domain while limiting the degradation on the original domain.Comment: To appear in Proc. SLT 2022, Jan 09-12, 2023, Doha, Qata

    Using Adapters to Overcome Catastrophic Forgetting in End-to-End Automatic Speech Recognition

    Full text link
    Learning a set of tasks in sequence remains a challenge for artificial neural networks, which, in such scenarios, tend to suffer from Catastrophic Forgetting (CF). The same applies to End-to-End (E2E) Automatic Speech Recognition (ASR) models, even for monolingual tasks. In this paper, we aim to overcome CF for E2E ASR by inserting adapters, small architectures of few parameters which allow a general model to be fine-tuned to a specific task, into our model. We make these adapters task-specific, while regularizing the parameters of the model shared by all tasks, thus stimulating the model to fully exploit the adapters while keeping the shared parameters to work well for all tasks. Our method outperforms all baselines on two monolingual experiments while being more storage efficient and without requiring the storage of data from previous tasks.Comment: Submitted to ICASSP 2023. 5 page

    Online Continual Learning of End-to-End Speech Recognition Models

    Full text link
    Continual Learning, also known as Lifelong Learning, aims to continually learn from new data as it becomes available. While prior research on continual learning in automatic speech recognition has focused on the adaptation of models across multiple different speech recognition tasks, in this paper we propose an experimental setting for \textit{online continual learning} for automatic speech recognition of a single task. Specifically focusing on the case where additional training data for the same task becomes available incrementally over time, we demonstrate the effectiveness of performing incremental model updates to end-to-end speech recognition models with an online Gradient Episodic Memory (GEM) method. Moreover, we show that with online continual learning and a selective sampling strategy, we can maintain an accuracy that is similar to retraining a model from scratch while requiring significantly lower computation costs. We have also verified our method with self-supervised learning (SSL) features.Comment: Accepted at InterSpeech 202
    • …
    corecore