conference paper
Multilingual Domain Adaptation for Speech Recognition Using LLMs
Abstract
Siemens Healthineers AGWe present a practical pipeline for multilingual domain adaptation in automatic speech recognition (ASR) that combines the Whisper model with large language models (LLMs). Using Aya-23-8B, Common Voice transcripts in 22 languages are automatically classified into the Law and Healthcare domains, producing high-quality domain labels at a fraction of the manual cost. These labels drive parameter-efficient (LoRA) fine-tuning of Whisper and deliver consistent relative Word Error Rate (WER) reductions of up to 14.3% for languages that contribute at least 800 in-domain utterances. A data-volume analysis reveals a clear breakpoint: gains become reliably large once that 800-utterance threshold is crossed, while monolingual tuning still rescues performance in truly low-resource settings. The workflow therefore shifts the key success factor from expensive hand labelling to scalable data acquisition, and can be replicated in new domains with minimal human intervention. © 2025 Elsevier B.V., All rights reserved- Conference Object
- Language Model
- Tuning
- Speech Recognition
- Speech Communication
- Classifieds
- Labels
- High Quality
- Healthcare Domains
- Drive Parameters
- Digital Storage
- Data Acquisition
- Computational Linguistics
- Whisper
- Automatic Speech Recognition
- Multilingual Speech Recognition
- Large Language Models
- Domain Adaptation
- Large Language Model