19,102 research outputs found
Robust Speech Recognition Using Generative Adversarial Networks
This paper describes a general, scalable, end-to-end framework that uses the
generative adversarial network (GAN) objective to enable robust speech
recognition. Encoders trained with the proposed approach enjoy improved
invariance by learning to map noisy audio to the same embedding space as that
of clean audio. Unlike previous methods, the new framework does not rely on
domain expertise or simplifying assumptions as are often needed in signal
processing, and directly encourages robustness in a data-driven way. We show
the new approach improves simulated far-field speech recognition of vanilla
sequence-to-sequence models without specialized front-ends or preprocessing
Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments
Eliminating the negative effect of non-stationary environmental noise is a
long-standing research topic for automatic speech recognition that stills
remains an important challenge. Data-driven supervised approaches, including
ones based on deep neural networks, have recently emerged as potential
alternatives to traditional unsupervised approaches and with sufficient
training, can alleviate the shortcomings of the unsupervised methods in various
real-life acoustic environments. In this light, we review recently developed,
representative deep learning approaches for tackling non-stationary additive
and convolutional degradation of speech with the aim of providing guidelines
for those involved in the development of environmentally robust speech
recognition systems. We separately discuss single- and multi-channel techniques
developed for the front-end and back-end of speech recognition systems, as well
as joint front-end and back-end training frameworks
- …