Analysis of Data Augmentation Methods for Low-Resource Maltese ASR

Borg, Claudia; DeMarco, Andrea; Gatt, Albert; Mena, Carlos; van der Plas, Lonneke; Williams, Aiden

Analysis of Data Augmentation Methods for Low-Resource Maltese ASR

Authors: Claudia Borg
Andrea DeMarco
Albert Gatt
Carlos Mena
Lonneke van der Plas
Aiden Williams
Publication date: 20 January 2023
Publisher

Abstract

Recent years have seen an increased interest in the computational speech processing of Maltese, but resources remain sparse. In this paper, we consider data augmentation techniques for improving speech recognition for low-resource languages, focusing on Maltese as a test case. We consider three different types of data augmentation: unsupervised training, multilingual training and the use of synthesized speech as training data. The goal is to determine which of these techniques, or combination of them, is the most effective to improve speech recognition for languages where the starting point is a small corpus of approximately 7 hours of transcribed speech. Our results show that combining the data augmentation techniques studied here lead us to an absolute WER improvement of 15% without the use of a language model.Comment: 12 page

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2111.07793

Last time updated on 06/02/2022