Search CORE

9 research outputs found

Retrieve and Copy: Scaling ASR Personalization to Large Catalogs

Author: Bodapati Sravan
Dingliwal Saket
Jayanthi Sai Muralidhar
Kulshreshtha Devang
Ronanki Srikanth
Publication venue
Publication date: 14/11/2023
Field of study

Personalization of automatic speech recognition (ASR) models is a widely studied topic because of its many practical applications. Most recently, attention-based contextual biasing techniques are used to improve the recognition of rare words and domain specific entities. However, due to performance constraints, the biasing is often limited to a few thousand entities, restricting real-world usability. To address this, we first propose a "Retrieve and Copy" mechanism to improve latency while retaining the accuracy even when scaled to a large catalog. We also propose a training strategy to overcome the degradation in recall at such scale due to an increased number of confusing entities. Overall, our approach achieves up to 6% more Word Error Rate reduction (WERR) and 3.6% absolute improvement in F1 when compared to a strong baseline. Our method also allows for large catalog sizes of up to 20K without significantly affecting WER and F1-scores, while achieving at least 20% inference speedup per acoustic frame.Comment: EMNLP 202

arXiv.org e-Print Archive

Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Scale

Author: Bansal Hritik
Bodapati Sravan
Dingliwal Saket
Gopalakrishnan Karthik
Kirchhoff Katrin
Roth Dan
Publication venue
Publication date: 16/08/2023
Field of study

Language models have been shown to perform better with an increase in scale on a wide variety of tasks via the in-context learning paradigm. In this paper, we investigate the hypothesis that the ability of a large language model to in-context learn-perform a task is not uniformly spread across all of its underlying components. Using a 66 billion parameter language model (OPT-66B) across a diverse set of 14 downstream tasks, we find this is indeed the case:

\sim

70% of attention heads and

\sim

20% of feed forward networks can be removed with minimal decline in task performance. We find substantial overlap in the set of attention heads (un)important for in-context learning across tasks and number of in-context examples. We also address our hypothesis through a task-agnostic lens, finding that a small set of attention heads in OPT-66B score highly on their ability to perform primitive induction operations associated with in-context learning, namely, prefix matching and copying. These induction heads overlap with task-specific important heads, reinforcing arguments by Olsson et al. (arXiv:2209.11895) regarding induction head generality to more sophisticated behaviors associated with in-context learning. Overall, our study provides several insights that indicate large language models may be under-trained for in-context learning and opens up questions on how to pre-train language models to more effectively perform in-context learning.Comment: Accepted at Annual Meeting of the Association for Computational Linguistics (ACL) 2023, Main Proceeding

arXiv.org e-Print Archive

Prompt Tuning GPT-2 language model for parameter-efficient domain adaptation of ASR systems

Author: Bodapati Sravan
Dingliwal Saket
Gadde Ravi Teja
Gandhe Ankur
Kirchhoff Katrin
Shenoy Ashish
Publication venue
Publication date: 21/07/2022
Field of study

Automatic Speech Recognition (ASR) systems have found their use in numerous industrial applications in very diverse domains creating a need to adapt to new domains with small memory and deployment overhead. In this work, we introduce domain-prompts, a methodology that involves training a small number of domain embedding parameters to prime a Transformer-based Language Model (LM) to a particular domain. Using this domain-adapted LM for rescoring ASR hypotheses can achieve 7-13% WER reduction for a new domain with just 1000 unlabeled textual domain-specific sentences. This improvement is comparable or even better than fully fine-tuned models even though just 0.02% of the parameters of the base LM are updated. Additionally, our method is deployment-friendly as the learnt domain embeddings are prefixed to the input to the model rather than changing the base model architecture. Therefore, our method is an ideal choice for on-the-fly adaptation of LMs used in ASR systems to progressively scale it to new domains.Comment: Accepted at InterSpeech 202

arXiv.org e-Print Archive