3 research outputs found
Corpus Synthesis for Zero-shot ASR domain Adaptation using Large Language Models
While Automatic Speech Recognition (ASR) systems are widely used in many
real-world applications, they often do not generalize well to new domains and
need to be finetuned on data from these domains. However, target-domain data
usually are not readily available in many scenarios. In this paper, we propose
a new strategy for adapting ASR models to new target domains without any text
or speech from those domains. To accomplish this, we propose a novel data
synthesis pipeline that uses a Large Language Model (LLM) to generate a target
domain text corpus, and a state-of-the-art controllable speech synthesis model
to generate the corresponding speech. We propose a simple yet effective
in-context instruction finetuning strategy to increase the effectiveness of LLM
in generating text corpora for new domains. Experiments on the SLURP dataset
show that the proposed method achieves an average relative word error rate
improvement of on unseen target domains without any performance drop in
source domains