829 research outputs found
The Zero Resource Speech Challenge 2017
We describe a new challenge aimed at discovering subword and word units from
raw speech. This challenge is the followup to the Zero Resource Speech
Challenge 2015. It aims at constructing systems that generalize across
languages and adapt to new speakers. The design features and evaluation metrics
of the challenge are presented and the results of seventeen models are
discussed.Comment: IEEE ASRU (Automatic Speech Recognition and Understanding) 2017.
Okinawa, Japa
Phoneme Segmentation Using Self-Supervised Speech Models
We apply transfer learning to the task of phoneme segmentation and
demonstrate the utility of representations learned in self-supervised
pre-training for the task. Our model extends transformer-style encoders with
strategically placed convolutions that manipulate features learned in
pre-training. Using the TIMIT and Buckeye corpora we train and test the model
in the supervised and unsupervised settings. The latter case is accomplished by
furnishing a noisy label-set with the predictions of a separate model, it
having been trained in an unsupervised fashion. Results indicate our model
eclipses previous state-of-the-art performance in both settings and on both
datasets. Finally, following observations during published code review and
attempts to reproduce past segmentation results, we find a need to disambiguate
the definition and implementation of widely-used evaluation metrics. We resolve
this ambiguity by delineating two distinct evaluation schemes and describing
their nuances.Comment: Accepted to SLT 202
- …