9 research outputs found
Retrieve and Copy: Scaling ASR Personalization to Large Catalogs
Personalization of automatic speech recognition (ASR) models is a widely
studied topic because of its many practical applications. Most recently,
attention-based contextual biasing techniques are used to improve the
recognition of rare words and domain specific entities. However, due to
performance constraints, the biasing is often limited to a few thousand
entities, restricting real-world usability. To address this, we first propose a
"Retrieve and Copy" mechanism to improve latency while retaining the accuracy
even when scaled to a large catalog. We also propose a training strategy to
overcome the degradation in recall at such scale due to an increased number of
confusing entities. Overall, our approach achieves up to 6% more Word Error
Rate reduction (WERR) and 3.6% absolute improvement in F1 when compared to a
strong baseline. Our method also allows for large catalog sizes of up to 20K
without significantly affecting WER and F1-scores, while achieving at least 20%
inference speedup per acoustic frame.Comment: EMNLP 202
Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Scale
Language models have been shown to perform better with an increase in scale
on a wide variety of tasks via the in-context learning paradigm. In this paper,
we investigate the hypothesis that the ability of a large language model to
in-context learn-perform a task is not uniformly spread across all of its
underlying components. Using a 66 billion parameter language model (OPT-66B)
across a diverse set of 14 downstream tasks, we find this is indeed the case:
70% of attention heads and 20% of feed forward networks can be
removed with minimal decline in task performance. We find substantial overlap
in the set of attention heads (un)important for in-context learning across
tasks and number of in-context examples. We also address our hypothesis through
a task-agnostic lens, finding that a small set of attention heads in OPT-66B
score highly on their ability to perform primitive induction operations
associated with in-context learning, namely, prefix matching and copying. These
induction heads overlap with task-specific important heads, reinforcing
arguments by Olsson et al. (arXiv:2209.11895) regarding induction head
generality to more sophisticated behaviors associated with in-context learning.
Overall, our study provides several insights that indicate large language
models may be under-trained for in-context learning and opens up questions on
how to pre-train language models to more effectively perform in-context
learning.Comment: Accepted at Annual Meeting of the Association for Computational
Linguistics (ACL) 2023, Main Proceeding
Prompt Tuning GPT-2 language model for parameter-efficient domain adaptation of ASR systems
Automatic Speech Recognition (ASR) systems have found their use in numerous
industrial applications in very diverse domains creating a need to adapt to new
domains with small memory and deployment overhead. In this work, we introduce
domain-prompts, a methodology that involves training a small number of domain
embedding parameters to prime a Transformer-based Language Model (LM) to a
particular domain. Using this domain-adapted LM for rescoring ASR hypotheses
can achieve 7-13% WER reduction for a new domain with just 1000 unlabeled
textual domain-specific sentences. This improvement is comparable or even
better than fully fine-tuned models even though just 0.02% of the parameters of
the base LM are updated. Additionally, our method is deployment-friendly as the
learnt domain embeddings are prefixed to the input to the model rather than
changing the base model architecture. Therefore, our method is an ideal choice
for on-the-fly adaptation of LMs used in ASR systems to progressively scale it
to new domains.Comment: Accepted at InterSpeech 202