Search CORE

57 research outputs found

Automatic Speech Recognition for Low-resource Languages and Accents Using Multilingual and Crosslingual Information

Author: Vu Ngoc Thang
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2014
Field of study

This thesis explores methods to rapidly bootstrap automatic speech recognition systems for languages, which lack resources for speech and language processing. We focus on finding approaches which allow using data from multiple languages to improve the performance for those languages on different levels, such as feature extraction, acoustic modeling and language modeling. Under application aspects, this thesis also includes research work on non-native and Code-Switching speech

KITopen

Acoustic Modelling for Under-Resourced Languages

Author: Stüker Sebastian
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2009
Field of study

Automatic speech recognition systems have so far been developed only for very few languages out of the 4,000-7,000 existing ones. In this thesis we examine methods to rapidly create acoustic models in new, possibly under-resourced languages, in a time and cost effective manner. For this we examine the use of multilingual models, the application of articulatory features across languages, and the automatic discovery of word-like units in unwritten languages

KITopen

Multilingual speech recognition:a posterior based approach

Author: Imseng David
Publication venue: Lausanne, EPFL
Publication date: 12/06/2013
Field of study

Infoscience - École polytechnique fédérale de Lausanne

Using Resources from a Closely-related Language to Develop ASR for a Very Under-resourced Language: A Case Study for Iban

Author: Besacier Laurent
Dyab Mohamed
Lecouteux Benjamin
Samson Juan Sarah
Publication venue: HAL CCSD
Publication date: 01/09/2015
Field of study

International audienceThis paper presents our strategies for developing an automatic speech recognition system for Iban, an under-resourced language. We faced several challenges such as no pronunciation dictionary and lack of training material for building acoustic models. To overcome these problems, we proposed approaches which exploit resources from a closely-related language (Malay). We developed a semi-supervised method for building the pronunciation dictionary and applied cross-lingual strategies for improving acoustic models trained with very limited training data. Both approaches displayed very encouraging results, which show that data from a closely-related language, if available, can be exploited to build ASR for a new language. In the final part of the paper, we present a zero-shot ASR using Malay resources that can be used as an alternative method for transcribing Iban speech

Hal - Université Grenoble Alpes

Unimas Institutional Repository

Boosting under-resourced speech recognizers by exploiting out of language data - Case study on Afrikaans

Author: Bourlard Hervé
Garner Philip N.
Imseng David
Publication venue: Idiap
Publication date: 19/12/2013
Field of study

Under-resourced speech recognizers may benefit from data in languages other than the target language. In this paper, we boost the performance of an Afrikaans speech recognizer by using already available data from other languages. To successfully exploit available multilingual resources, we use posterior features, estimated by multilayer perceptrons that are trained on similar languages. For two different acoustic modeling techniques, Tandem and Kullback-Leibler divergence based HMMs, the proposed multilingual system yields more than 10% relative improvement compared to the corresponding monolingual systems only trained on Afrikaans

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

Allophant: Cross-lingual Phoneme Recognition with Articulatory Attributes

Author: Georges Munir
Glocker Kevin
Herygers Aaricia
Publication venue
Publication date: 16/08/2023
Field of study

This paper proposes Allophant, a multilingual phoneme recognizer. It requires only a phoneme inventory for cross-lingual transfer to a target language, allowing for low-resource recognition. The architecture combines a compositional phone embedding approach with individually supervised phonetic attribute classifiers in a multi-task architecture. We also introduce Allophoible, an extension of the PHOIBLE database. When combined with a distance based mapping approach for grapheme-to-phoneme outputs, it allows us to train on PHOIBLE inventories directly. By training and evaluating on 34 languages, we found that the addition of multi-task learning improves the model's capability of being applied to unseen phonemes and phoneme inventories. On supervised languages we achieve phoneme error rate improvements of 11 percentage points (pp.) compared to a baseline without multi-task learning. Evaluation of zero-shot transfer on 84 languages yielded a decrease in PER of 2.63 pp. over the baseline.Comment: 5 pages, 2 figures, 2 tables, accepted to INTERSPEECH 2023; published versio

arXiv.org e-Print Archive

Multilingual Articulatory Features

Author: Stüker Sebastian
Publication venue
Publication date: 04/08/2008
Field of study

KITopen