304 research outputs found
Unsupervised Word Segmentation and Lexicon Discovery Using Acoustic Word Embeddings
In settings where only unlabelled speech data is available, speech technology
needs to be developed without transcriptions, pronunciation dictionaries, or
language modelling text. A similar problem is faced when modelling infant
language acquisition. In these cases, categorical linguistic structure needs to
be discovered directly from speech audio. We present a novel unsupervised
Bayesian model that segments unlabelled speech and clusters the segments into
hypothesized word groupings. The result is a complete unsupervised tokenization
of the input speech in terms of discovered word types. In our approach, a
potential word segment (of arbitrary length) is embedded in a fixed-dimensional
acoustic vector space. The model, implemented as a Gibbs sampler, then builds a
whole-word acoustic model in this space while jointly performing segmentation.
We report word error rates in a small-vocabulary connected digit recognition
task by mapping the unsupervised decoded output to ground truth transcriptions.
The model achieves around 20% error rate, outperforming a previous HMM-based
system by about 10% absolute. Moreover, in contrast to the baseline, our model
does not require a pre-specified vocabulary size.Comment: 11 pages, 8 figures; Accepted to the IEEE/ACM Transactions on Audio,
Speech, and Language Processin
Revisiting speech segmentation and lexicon learning with better features
We revisit a self-supervised method that segments unlabelled speech into
word-like segments. We start from the two-stage duration-penalised dynamic
programming method that performs zero-resource segmentation without learning an
explicit lexicon. In the first acoustic unit discovery stage, we replace
contrastive predictive coding features with HuBERT. After word segmentation in
the second stage, we get an acoustic word embedding for each segment by
averaging HuBERT features. These embeddings are clustered using K-means to get
a lexicon. The result is good full-coverage segmentation with a lexicon that
achieves state-of-the-art performance on the ZeroSpeech benchmarks.Comment: 2 page
The Zero Resource Speech Challenge 2017
We describe a new challenge aimed at discovering subword and word units from
raw speech. This challenge is the followup to the Zero Resource Speech
Challenge 2015. It aims at constructing systems that generalize across
languages and adapt to new speakers. The design features and evaluation metrics
of the challenge are presented and the results of seventeen models are
discussed.Comment: IEEE ASRU (Automatic Speech Recognition and Understanding) 2017.
Okinawa, Japa
- …