4 research outputs found
Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments
Eliminating the negative effect of non-stationary environmental noise is a
long-standing research topic for automatic speech recognition that stills
remains an important challenge. Data-driven supervised approaches, including
ones based on deep neural networks, have recently emerged as potential
alternatives to traditional unsupervised approaches and with sufficient
training, can alleviate the shortcomings of the unsupervised methods in various
real-life acoustic environments. In this light, we review recently developed,
representative deep learning approaches for tackling non-stationary additive
and convolutional degradation of speech with the aim of providing guidelines
for those involved in the development of environmentally robust speech
recognition systems. We separately discuss single- and multi-channel techniques
developed for the front-end and back-end of speech recognition systems, as well
as joint front-end and back-end training frameworks
Joint NN-Supported Multichannel Reduction of Acoustic Echo, Reverberation and Noise
We consider the problem of simultaneous reduction of acoustic echo,
reverberation and noise. In real scenarios, these distortion sources may occur
simultaneously and reducing them implies combining the corresponding
distortion-specific filters. As these filters interact with each other, they
must be jointly optimized. We propose to model the target and residual signals
after linear echo cancellation and dereverberation using a multichannel
Gaussian modeling framework and to jointly represent their spectra by means of
a neural network. We develop an iterative block-coordinate ascent algorithm to
update all the filters. We evaluate our system on real recordings of acoustic
echo, reverberation and noise acquired with a smart speaker in various
situations. The proposed approach outperforms in terms of overall distortion a
cascade of the individual approaches and a joint reduction approach which does
not rely on a spectral model of the target and residual signals