57 research outputs found
Automatic Speech Recognition for Low-resource Languages and Accents Using Multilingual and Crosslingual Information
This thesis explores methods to rapidly bootstrap automatic speech recognition systems for languages, which lack resources for speech and language processing. We focus on finding approaches which allow using data from multiple languages to improve the performance for those languages on different levels, such as feature extraction, acoustic modeling and language modeling. Under application aspects, this thesis also includes research work on non-native and Code-Switching speech
Acoustic Modelling for Under-Resourced Languages
Automatic speech recognition systems have so far been developed only for very few languages out of the 4,000-7,000 existing ones.
In this thesis we examine methods to rapidly create acoustic models in new, possibly under-resourced languages, in a time and cost effective manner. For this we examine the use of multilingual models, the application of articulatory features across languages, and the automatic discovery of word-like units in unwritten languages
Using Resources from a Closely-related Language to Develop ASR for a Very Under-resourced Language: A Case Study for Iban
International audienceThis paper presents our strategies for developing an automatic speech recognition system for Iban, an under-resourced language. We faced several challenges such as no pronunciation dictionary and lack of training material for building acoustic models. To overcome these problems, we proposed approaches which exploit resources from a closely-related language (Malay). We developed a semi-supervised method for building the pronunciation dictionary and applied cross-lingual strategies for improving acoustic models trained with very limited training data. Both approaches displayed very encouraging results, which show that data from a closely-related language, if available, can be exploited to build ASR for a new language. In the final part of the paper, we present a zero-shot ASR using Malay resources that can be used as an alternative method for transcribing Iban speech
Boosting under-resourced speech recognizers by exploiting out of language data - Case study on Afrikaans
Under-resourced speech recognizers may benefit from data in languages other than the target language. In this paper, we boost the performance of an Afrikaans speech recognizer by using already available data from other languages. To successfully exploit available multilingual resources, we use posterior features, estimated by multilayer perceptrons that are trained on similar languages. For two different acoustic modeling techniques, Tandem and Kullback-Leibler divergence based HMMs, the proposed multilingual system yields more than 10% relative improvement compared to the corresponding monolingual systems only trained on Afrikaans
Allophant: Cross-lingual Phoneme Recognition with Articulatory Attributes
This paper proposes Allophant, a multilingual phoneme recognizer. It requires
only a phoneme inventory for cross-lingual transfer to a target language,
allowing for low-resource recognition. The architecture combines a
compositional phone embedding approach with individually supervised phonetic
attribute classifiers in a multi-task architecture. We also introduce
Allophoible, an extension of the PHOIBLE database. When combined with a
distance based mapping approach for grapheme-to-phoneme outputs, it allows us
to train on PHOIBLE inventories directly. By training and evaluating on 34
languages, we found that the addition of multi-task learning improves the
model's capability of being applied to unseen phonemes and phoneme inventories.
On supervised languages we achieve phoneme error rate improvements of 11
percentage points (pp.) compared to a baseline without multi-task learning.
Evaluation of zero-shot transfer on 84 languages yielded a decrease in PER of
2.63 pp. over the baseline.Comment: 5 pages, 2 figures, 2 tables, accepted to INTERSPEECH 2023; published
versio
- …