14,698 research outputs found
Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments
Eliminating the negative effect of non-stationary environmental noise is a
long-standing research topic for automatic speech recognition that stills
remains an important challenge. Data-driven supervised approaches, including
ones based on deep neural networks, have recently emerged as potential
alternatives to traditional unsupervised approaches and with sufficient
training, can alleviate the shortcomings of the unsupervised methods in various
real-life acoustic environments. In this light, we review recently developed,
representative deep learning approaches for tackling non-stationary additive
and convolutional degradation of speech with the aim of providing guidelines
for those involved in the development of environmentally robust speech
recognition systems. We separately discuss single- and multi-channel techniques
developed for the front-end and back-end of speech recognition systems, as well
as joint front-end and back-end training frameworks
Recommended from our members
Monaural speech separation with deep learning using phase modelling and capsule networks
The removal of background noise from speech audio is a problem with high practical relevance. A variety of deep learning approaches have been applied to it in recent years, most of which operate on a magnitude spectrogram representation of a noisy recording to estimate the isolated speaking voice. This work investigates ways to include phase information, which is commonly discarded, firstly within a convolutional neural network (CNN) architecture, and secondly by applying capsule networks, to our knowledge the first time capsules have been used in source separation. We present a Circular Loss function, which takes into account the periodic nature of phase. Our results show that the inclusion of phase information leads to an improvement in the quality of speech separation. We also find that in our experiments convolutional neural networks outperform capsule networks at speech separation
- …