Open-vocabulary keyword spotting in any language through multilingual
  contrastive speech-phoneme pretraining

Islam, Jahurul; Samir, Farhan; Yang, Changbing; Zhu, Jian

Open-vocabulary keyword spotting in any language through multilingual contrastive speech-phoneme pretraining

Authors: Jahurul Islam
Farhan Samir
Changbing Yang
Jian Zhu
Publication date: 14 November 2023
Publisher

Abstract

In this paper, we introduce a massively multilingual speech corpora with fine-grained phonemic transcriptions, encompassing more than 115 languages from diverse language families. Based on this multilingual dataset, we propose CLAP-IPA, a multilingual phoneme-speech contrastive embedding model capable of open-vocabulary matching between speech signals and phonemically transcribed keywords or arbitrary phrases. The proposed model has been tested on two fieldwork speech corpora in 97 unseen languages, exhibiting strong generalizability across languages. Comparison with a text-based model shows that using phonemes as modeling units enables much better crosslinguistic generalization than orthographic texts.Comment: Preprint; Work in Progres

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2311.08323

Last time updated on 10/02/2024