Search CORE

263 research outputs found

Multilingual bottleneck features for subword modeling in zero-resource languages

Author: Goldwater Sharon
Hermann Enno
Publication venue: 'International Speech Communication Association'
Publication date: 18/06/2018
Field of study

How can we effectively develop speech technology for languages where no transcribed data is available? Many existing approaches use no annotated resources at all, yet it makes sense to leverage information from large annotated corpora in other languages, for example in the form of multilingual bottleneck features (BNFs) obtained from a supervised speech recognition system. In this work, we evaluate the benefits of BNFs for subword modeling (feature extraction) in six unseen languages on a word discrimination task. First we establish a strong unsupervised baseline by combining two existing methods: vocal tract length normalisation (VTLN) and the correspondence autoencoder (cAE). We then show that BNFs trained on a single language already beat this baseline; including up to 10 languages results in additional improvements which cannot be matched by just adding more data from a single language. Finally, we show that the cAE can improve further on the BNFs if high-quality same-word pairs are available.Comment: 5 pages, 2 figures, 4 tables; accepted at Interspeech 201

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

The Zero Resource Speech Challenge 2017

Author: Anguera Xavier
Benjumea Juan
Bernard Mathieu
Besacier Laurent
Cao Xuan Nga
Dunbar Ewan
Dupoux Emmanuel
Karadayi Julien
Publication venue
Publication date: 12/12/2017
Field of study

We describe a new challenge aimed at discovering subword and word units from raw speech. This challenge is the followup to the Zero Resource Speech Challenge 2015. It aims at constructing systems that generalize across languages and adapt to new speakers. The design features and evaluation metrics of the challenge are presented and the results of seventeen models are discussed.Comment: IEEE ASRU (Automatic Speech Recognition and Understanding) 2017. Okinawa, Japa

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Multilingual and Unsupervised Subword Modelingfor Zero-Resource Languages

Author: Alumäe
Badino
Badino
Carlin
Chen
Chen
Cui
De Vries
Dunbar
Dunbar
Gales
Grézl
Heck
Heck
Heck
Heck
Hermann
Huijbregts
Jansen
Jansen
Kamper
Kamper
Kamper
Lang
Lee
Levin
Menon
Paul
Peddinti
Pitt
Povey
Renshaw
Riad
Saon
Schatz
Schatz
Schultz
Shibata
Swietojanski
Synnaeve
Synnaeve
Thomas
Trmal
Tsuchiya
Versteegh
Veselý
Vu
Waibel
Walter
Yuan
Yuan
Yuan
Zeghidour
Zeiler
Zhang
Publication venue: 'Elsevier BV'
Publication date: 07/04/2020
Field of study

Subword modeling for zero-resource languages aims to learn low-level representations of speech audio without using transcriptions or other resources from the target language (such as text corpora or pronunciation dictionaries). A good representation should capture phonetic content and abstract away from other types of variability, such as speaker differences and channel noise. Previous work in this area has primarily focused unsupervised learning from target language data only, and has been evaluated only intrinsically. Here we directly compare multiple methods, including some that use only target language speech data and some that use transcribed speech from other (non-target) languages, and we evaluate using two intrinsic measures as well as on a downstream unsupervised word segmentation and clustering task. We find that combining two existing target-language-only methods yields better features than either method alone. Nevertheless, even better results are obtained by extracting target language bottleneck features using a model trained on other languages. Cross-lingual training using just one other language is enough to provide this benefit, but multilingual training helps even more. In addition to these results, which hold across both intrinsic measures and the extrinsic task, we discuss the qualitative differences between the different types of learned features.Comment: 17 pages, 6 figures, 7 tables. Accepted for publication in Computer Speech and Language. arXiv admin note: text overlap with arXiv:1803.0886

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Edinburgh Research Explorer

An Empirical Evaluation of Zero Resource Acoustic Unit Discovery

Author: Burget Lukas
Dehak Najim
Ghahremani Pegah
Kesiraju Santosh
Khudanpur Sanjeev
Liu Chunxi
Ondel Lucas
Rott Alena
Sun Ming
Yang Jinyi
Publication venue
Publication date: 04/02/2017
Field of study

Acoustic unit discovery (AUD) is a process of automatically identifying a categorical acoustic unit inventory from speech and producing corresponding acoustic unit tokenizations. AUD provides an important avenue for unsupervised acoustic model training in a zero resource setting where expert-provided linguistic knowledge and transcribed speech are unavailable. Therefore, to further facilitate zero-resource AUD process, in this paper, we demonstrate acoustic feature representations can be significantly improved by (i) performing linear discriminant analysis (LDA) in an unsupervised self-trained fashion, and (ii) leveraging resources of other languages through building a multilingual bottleneck (BN) feature extractor to give effective cross-lingual generalization. Moreover, we perform comprehensive evaluations of AUD efficacy on multiple downstream speech applications, and their correlated performance suggests that AUD evaluations are feasible using different alternative language resources when only a subset of these evaluation resources can be available in typical zero resource applications.Comment: 5 pages, 1 figure; Accepted for publication at ICASSP 201

arXiv.org e-Print Archive

Crossref