Investigating Zero-Shot Generalizability on Mandarin-English
  Code-Switched ASR and Speech-to-text Translation of Recent Foundation Models
  with Self-Supervision and Weak Supervision

Hsiao, Chi-Yuan; Huang, Kuan-Po; Kuan, Chun-Yi; Lee, Hung-yi; Lu, Ke-Han; Yang, Chih-Kai

Investigating Zero-Shot Generalizability on Mandarin-English Code-Switched ASR and Speech-to-text Translation of Recent Foundation Models with Self-Supervision and Weak Supervision

Authors: Chi-Yuan Hsiao
Kuan-Po Huang
Chun-Yi Kuan
Hung-yi Lee
Ke-Han Lu
Chih-Kai Yang
Publication date: 30 December 2023
Publisher

Abstract

This work evaluated several cutting-edge large-scale foundation models based on self-supervision or weak supervision, including SeamlessM4T, SeamlessM4T v2, and Whisper-large-v3, on three code-switched corpora. We found that self-supervised models can achieve performances close to the supervised model, indicating the effectiveness of multilingual self-supervised pre-training. We also observed that these models still have room for improvement as they kept making similar mistakes and had unsatisfactory performances on modeling intra-sentential code-switching. In addition, the validity of several variants of Whisper was explored, and we concluded that they remained effective in a code-switching scenario, and similar techniques for self-supervised models are worth studying to boost the performance of code-switched tasks.Comment: Submitted to ICASSP 2024 Self-supervision in Audio, Speech and Beyond worksho

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2401.00273

Last time updated on 14/08/2024