263 research outputs found
Multilingual bottleneck features for subword modeling in zero-resource languages
How can we effectively develop speech technology for languages where no
transcribed data is available? Many existing approaches use no annotated
resources at all, yet it makes sense to leverage information from large
annotated corpora in other languages, for example in the form of multilingual
bottleneck features (BNFs) obtained from a supervised speech recognition
system. In this work, we evaluate the benefits of BNFs for subword modeling
(feature extraction) in six unseen languages on a word discrimination task.
First we establish a strong unsupervised baseline by combining two existing
methods: vocal tract length normalisation (VTLN) and the correspondence
autoencoder (cAE). We then show that BNFs trained on a single language already
beat this baseline; including up to 10 languages results in additional
improvements which cannot be matched by just adding more data from a single
language. Finally, we show that the cAE can improve further on the BNFs if
high-quality same-word pairs are available.Comment: 5 pages, 2 figures, 4 tables; accepted at Interspeech 201
The Zero Resource Speech Challenge 2017
We describe a new challenge aimed at discovering subword and word units from
raw speech. This challenge is the followup to the Zero Resource Speech
Challenge 2015. It aims at constructing systems that generalize across
languages and adapt to new speakers. The design features and evaluation metrics
of the challenge are presented and the results of seventeen models are
discussed.Comment: IEEE ASRU (Automatic Speech Recognition and Understanding) 2017.
Okinawa, Japa
Multilingual and Unsupervised Subword Modelingfor Zero-Resource Languages
Subword modeling for zero-resource languages aims to learn low-level
representations of speech audio without using transcriptions or other resources
from the target language (such as text corpora or pronunciation dictionaries).
A good representation should capture phonetic content and abstract away from
other types of variability, such as speaker differences and channel noise.
Previous work in this area has primarily focused unsupervised learning from
target language data only, and has been evaluated only intrinsically. Here we
directly compare multiple methods, including some that use only target language
speech data and some that use transcribed speech from other (non-target)
languages, and we evaluate using two intrinsic measures as well as on a
downstream unsupervised word segmentation and clustering task. We find that
combining two existing target-language-only methods yields better features than
either method alone. Nevertheless, even better results are obtained by
extracting target language bottleneck features using a model trained on other
languages. Cross-lingual training using just one other language is enough to
provide this benefit, but multilingual training helps even more. In addition to
these results, which hold across both intrinsic measures and the extrinsic
task, we discuss the qualitative differences between the different types of
learned features.Comment: 17 pages, 6 figures, 7 tables. Accepted for publication in Computer
Speech and Language. arXiv admin note: text overlap with arXiv:1803.0886
An Empirical Evaluation of Zero Resource Acoustic Unit Discovery
Acoustic unit discovery (AUD) is a process of automatically identifying a
categorical acoustic unit inventory from speech and producing corresponding
acoustic unit tokenizations. AUD provides an important avenue for unsupervised
acoustic model training in a zero resource setting where expert-provided
linguistic knowledge and transcribed speech are unavailable. Therefore, to
further facilitate zero-resource AUD process, in this paper, we demonstrate
acoustic feature representations can be significantly improved by (i)
performing linear discriminant analysis (LDA) in an unsupervised self-trained
fashion, and (ii) leveraging resources of other languages through building a
multilingual bottleneck (BN) feature extractor to give effective cross-lingual
generalization. Moreover, we perform comprehensive evaluations of AUD efficacy
on multiple downstream speech applications, and their correlated performance
suggests that AUD evaluations are feasible using different alternative language
resources when only a subset of these evaluation resources can be available in
typical zero resource applications.Comment: 5 pages, 1 figure; Accepted for publication at ICASSP 201
- …