58,942 research outputs found
The Zero Resource Speech Challenge 2017
We describe a new challenge aimed at discovering subword and word units from
raw speech. This challenge is the followup to the Zero Resource Speech
Challenge 2015. It aims at constructing systems that generalize across
languages and adapt to new speakers. The design features and evaluation metrics
of the challenge are presented and the results of seventeen models are
discussed.Comment: IEEE ASRU (Automatic Speech Recognition and Understanding) 2017.
Okinawa, Japa
The Zero Resource Speech Challenge 2020: Discovering discrete subword and word units
International audienceWe present the Zero Resource Speech Challenge 2020, which aims at learning speech representations from raw audio signals without any labels. It combines the data sets and metrics from two previous benchmarks (2017 and 2019) and features two tasks which tap into two levels of speech representation. The first task is to discover low bit-rate subword representations that optimize the quality of speech synthesis; the second one is to discover word-like units from unsegmented raw speech. We present the results of the twenty submitted models and discuss the implications of the main findings for unsupervised speech learning
An embedded segmental K-means model for unsupervised segmentation and clustering of speech
Unsupervised segmentation and clustering of unlabelled speech are core
problems in zero-resource speech processing. Most approaches lie at
methodological extremes: some use probabilistic Bayesian models with
convergence guarantees, while others opt for more efficient heuristic
techniques. Despite competitive performance in previous work, the full Bayesian
approach is difficult to scale to large speech corpora. We introduce an
approximation to a recent Bayesian model that still has a clear objective
function but improves efficiency by using hard clustering and segmentation
rather than full Bayesian inference. Like its Bayesian counterpart, this
embedded segmental K-means model (ES-KMeans) represents arbitrary-length word
segments as fixed-dimensional acoustic word embeddings. We first compare
ES-KMeans to previous approaches on common English and Xitsonga data sets (5
and 2.5 hours of speech): ES-KMeans outperforms a leading heuristic method in
word segmentation, giving similar scores to the Bayesian model while being 5
times faster with fewer hyperparameters. However, its clusters are less pure
than those of the other models. We then show that ES-KMeans scales to larger
corpora by applying it to the 5 languages of the Zero Resource Speech Challenge
2017 (up to 45 hours), where it performs competitively compared to the
challenge baseline.Comment: 8 pages, 3 figures, 3 tables; accepted to ASRU 201
A Very Low Resource Language Speech Corpus for Computational Language Documentation Experiments
Most speech and language technologies are trained with massive amounts of
speech and text information. However, most of the world languages do not have
such resources or stable orthography. Systems constructed under these almost
zero resource conditions are not only promising for speech technology but also
for computational language documentation. The goal of computational language
documentation is to help field linguists to (semi-)automatically analyze and
annotate audio recordings of endangered and unwritten languages. Example tasks
are automatic phoneme discovery or lexicon discovery from the speech signal.
This paper presents a speech corpus collected during a realistic language
documentation process. It is made up of 5k speech utterances in Mboshi (Bantu
C25) aligned to French text translations. Speech transcriptions are also made
available: they correspond to a non-standard graphemic form close to the
language phonology. We present how the data was collected, cleaned and
processed and we illustrate its use through a zero-resource task: spoken term
discovery. The dataset is made available to the community for reproducible
computational language documentation experiments and their evaluation.Comment: accepted to LREC 201
- âŠ