17,805 research outputs found
Improved training for online end-to-end speech recognition systems
Achieving high accuracy with end-to-end speech recognizers requires careful
parameter initialization prior to training. Otherwise, the networks may fail to
find a good local optimum. This is particularly true for online networks, such
as unidirectional LSTMs. Currently, the best strategy to train such systems is
to bootstrap the training from a tied-triphone system. However, this is time
consuming, and more importantly, is impossible for languages without a
high-quality pronunciation lexicon. In this work, we propose an initialization
strategy that uses teacher-student learning to transfer knowledge from a large,
well-trained, offline end-to-end speech recognition model to an online
end-to-end model, eliminating the need for a lexicon or any other linguistic
resources. We also explore curriculum learning and label smoothing and show how
they can be combined with the proposed teacher-student learning for further
improvements. We evaluate our methods on a Microsoft Cortana personal assistant
task and show that the proposed method results in a 19 % relative improvement
in word error rate compared to a randomly-initialized baseline system.Comment: Interspeech 201
Conditional Teacher-Student Learning
The teacher-student (T/S) learning has been shown to be effective for a
variety of problems such as domain adaptation and model compression. One
shortcoming of the T/S learning is that a teacher model, not always perfect,
sporadically produces wrong guidance in form of posterior probabilities that
misleads the student model towards a suboptimal performance. To overcome this
problem, we propose a conditional T/S learning scheme, in which a "smart"
student model selectively chooses to learn from either the teacher model or the
ground truth labels conditioned on whether the teacher can correctly predict
the ground truth. Unlike a naive linear combination of the two knowledge
sources, the conditional learning is exclusively engaged with the teacher model
when the teacher model's prediction is correct, and otherwise backs off to the
ground truth. Thus, the student model is able to learn effectively from the
teacher and even potentially surpass the teacher. We examine the proposed
learning scheme on two tasks: domain adaptation on CHiME-3 dataset and speaker
adaptation on Microsoft short message dictation dataset. The proposed method
achieves 9.8% and 12.8% relative word error rate reductions, respectively, over
T/S learning for environment adaptation and speaker-independent model for
speaker adaptation.Comment: 5 pages, 1 figure, ICASSP 201
Lessons from Building Acoustic Models with a Million Hours of Speech
This is a report of our lessons learned building acoustic models from 1
Million hours of unlabeled speech, while labeled speech is restricted to 7,000
hours. We employ student/teacher training on unlabeled data, helping scale out
target generation in comparison to confidence model based methods, which
require a decoder and a confidence model. To optimize storage and to
parallelize target generation, we store high valued logits from the teacher
model. Introducing the notion of scheduled learning, we interleave learning on
unlabeled and labeled data. To scale distributed training across a large number
of GPUs, we use BMUF with 64 GPUs, while performing sequence training only on
labeled data with gradient threshold compression SGD using 16 GPUs. Our
experiments show that extremely large amounts of data are indeed useful; with
little hyper-parameter tuning, we obtain relative WER improvements in the 10 to
20% range, with higher gains in noisier conditions.Comment: "Copyright 2019 IEEE. Personal use of this material is permitted.
Permission from IEEE must be obtained for all other uses, in any current or
future media, including reprinting/republishing this material for advertising
or promotional purposes, creating new collective works, for resale or
redistribution to servers or lists, or reuse of any copyrighted component of
this work in other works.
- …