Prompting and Adapter Tuning for Self-supervised Encoder-Decoder Speech
  Model

Chang, Kai-Wei; Chen, Ming-Hsin; Hsu, Jing Neng; Huang, Chien-yu; Huang, Paul Kuo-Ming; Lee, Hung-yi; Li, Shang-Wen; Lin, Yun-Ping

Prompting and Adapter Tuning for Self-supervised Encoder-Decoder Speech Model

Authors: Kai-Wei Chang
Ming-Hsin Chen
Jing Neng Hsu
Chien-yu Huang
Paul Kuo-Ming Huang
Hung-yi Lee
Shang-Wen Li
Yun-Ping Lin
Publication date: 14 November 2023
Publisher

Abstract

Prompting and adapter tuning have emerged as efficient alternatives to fine-tuning (FT) methods. However, existing studies on speech prompting focused on classification tasks and failed on more complex sequence generation tasks. Besides, adapter tuning is primarily applied with a focus on encoder-only self-supervised models. Our experiments show that prompting on Wav2Seq, a self-supervised encoder-decoder model, surpasses previous works in sequence generation tasks. It achieves a remarkable 53% relative improvement in word error rate for ASR and a 27% in F1 score for slot filling. Additionally, prompting competes with the FT method in the low-resource scenario. Moreover, we show the transferability of prompting and adapter tuning on Wav2Seq in cross-lingual ASR. When limited trainable parameters are involved, prompting and adapter tuning consistently outperform conventional FT across 7 languages. Notably, in the low-resource scenario, prompting consistently outperforms adapter tuning.Comment: Accepted to IEEE ASRU 202

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2310.02971

Last time updated on 10/02/2024