5 research outputs found
English Broadcast News Speech Recognition by Humans and Machines
With recent advances in deep learning, considerable attention has been given
to achieving automatic speech recognition performance close to human
performance on tasks like conversational telephone speech (CTS) recognition. In
this paper we evaluate the usefulness of these proposed techniques on broadcast
news (BN), a similar challenging task. We also perform a set of recognition
measurements to understand how close the achieved automatic speech recognition
results are to human performance on this task. On two publicly available BN
test sets, DEV04F and RT04, our speech recognition system using LSTM and
residual network based acoustic models with a combination of n-gram and neural
network language models performs at 6.5% and 5.9% word error rate. By achieving
new performance milestones on these test sets, our experiments show that
techniques developed on other related tasks, like CTS, can be transferred to
achieve similar performance. In contrast, the best measured human recognition
performance on these test sets is much lower, at 3.6% and 2.8% respectively,
indicating that there is still room for new techniques and improvements in this
space, to reach human performance levels.Comment: \copyright 2019 IEEE. Personal use of this material is permitted.
Permission from IEEE must be obtained for all other uses, in any current or
future media, including reprinting/republishing this material for advertising
or promotional purposes, creating new collective works, for resale or
redistribution to servers or lists, or reuse of any copyrighted component of
this work in other work
Investigating Generative Adversarial Networks based Speech Dereverberation for Robust Speech Recognition
We investigate the use of generative adversarial networks (GANs) in speech
dereverberation for robust speech recognition. GANs have been recently studied
for speech enhancement to remove additive noises, but there still lacks of a
work to examine their ability in speech dereverberation and the advantages of
using GANs have not been fully established. In this paper, we provide deep
investigations in the use of GAN-based dereverberation front-end in ASR. First,
we study the effectiveness of different dereverberation networks (the generator
in GAN) and find that LSTM leads a significant improvement as compared with
feed-forward DNN and CNN in our dataset. Second, further adding residual
connections in the deep LSTMs can boost the performance as well. Finally, we
find that, for the success of GAN, it is important to update the generator and
the discriminator using the same mini-batch data during training. Moreover,
using reverberant spectrogram as a condition to discriminator, as suggested in
previous studies, may degrade the performance. In summary, our GAN-based
dereverberation front-end achieves 14%-19% relative CER reduction as compared
to the baseline DNN dereverberation network when tested on a strong
multi-condition training acoustic model.Comment: Interspeech 201