Search CORE

16 research outputs found

The Gift of Feedback: Improving ASR Model Quality by Learning from User Corrections through Federated Learning

Author: Chen Mingqing
Ding Yuxin
Guliani Dhruv
Mathews Rajiv
Motta Giovanni
Prabhavalkar Rohit
Zhang Harry
Zhou Lillian
Publication venue
Publication date: 30/11/2023
Field of study

Automatic speech recognition (ASR) models are typically trained on large datasets of transcribed speech. As language evolves and new terms come into use, these models can become outdated and stale. In the context of models trained on the server but deployed on edge devices, errors may result from the mismatch between server training data and actual on-device usage. In this work, we seek to continually learn from on-device user corrections through Federated Learning (FL) to address this issue. We explore techniques to target fresh terms that the model has not previously encountered, learn long-tail words, and mitigate catastrophic forgetting. In experimental evaluations, we find that the proposed techniques improve model recognition of fresh terms, while preserving quality on the overall language distribution.Comment: Accepted to IEEE ASRU 202

arXiv.org e-Print Archive

O-1: Self-training with Oracle and 1-best Hypothesis

Author: Audhkhasi Kartik
Baskar Murali Karthick
Ramabhadran Bhuvana
Rosenberg Andrew
Publication venue
Publication date: 14/08/2023
Field of study

We introduce O-1, a new self-training objective to reduce training bias and unify training and evaluation metrics for speech recognition. O-1 is a faster variant of Expected Minimum Bayes Risk (EMBR), that boosts the oracle hypothesis and can accommodate both supervised and unsupervised data. We demonstrate the effectiveness of our approach in terms of recognition on publicly available SpeechStew datasets and a large-scale, in-house data set. On Speechstew, the O-1 objective closes the gap between the actual and oracle performance by 80\% relative compared to EMBR which bridges the gap by 43\% relative. O-1 achieves 13\% to 25\% relative improvement over EMBR on the various datasets that SpeechStew comprises of, and a 12\% relative gap reduction with respect to the oracle WER over EMBR training on the in-house dataset. Overall, O-1 results in a 9\% relative improvement in WER over EMBR, thereby speaking to the scalability of the proposed objective for large-scale datasets

arXiv.org e-Print Archive