Acoustic Word Embeddings for Untranscribed Target Languages with
  Continued Pretraining and Learned Pooling

Goldwater, Sharon; Klejch, Ondrej; Sanabria, Ramon; Tang, Hao

Acoustic Word Embeddings for Untranscribed Target Languages with Continued Pretraining and Learned Pooling

Authors: Sharon Goldwater
Ondrej Klejch
Ramon Sanabria
Hao Tang
Publication date: 3 June 2023
Publisher

Abstract

Acoustic word embeddings are typically created by training a pooling function using pairs of word-like units. For unsupervised systems, these are mined using k-nearest neighbor (KNN) search, which is slow. Recently, mean-pooled representations from a pre-trained self-supervised English model were suggested as a promising alternative, but their performance on target languages was not fully competitive. Here, we explore improvements to both approaches: we use continued pre-training to adapt the self-supervised model to the target language, and we use a multilingual phone recognizer (MPR) to mine phone n-gram pairs for training the pooling function. Evaluating on four languages, we show that both methods outperform a recent approach on word discrimination. Moreover, the MPR method is orders of magnitude faster than KNN, and is highly data efficient. We also show a small improvement from performing learned pooling on top of the continued pre-trained representations.Comment: Accepted to Interspeech 202

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2306.02153

Last time updated on 08/06/2023