858 research outputs found
Deep Long Short-Term Memory Adaptive Beamforming Networks For Multichannel Robust Speech Recognition
Far-field speech recognition in noisy and reverberant conditions remains a
challenging problem despite recent deep learning breakthroughs. This problem is
commonly addressed by acquiring a speech signal from multiple microphones and
performing beamforming over them. In this paper, we propose to use a recurrent
neural network with long short-term memory (LSTM) architecture to adaptively
estimate real-time beamforming filter coefficients to cope with non-stationary
environmental noise and dynamic nature of source and microphones positions
which results in a set of timevarying room impulse responses. The LSTM adaptive
beamformer is jointly trained with a deep LSTM acoustic model to predict senone
labels. Further, we use hidden units in the deep LSTM acoustic model to assist
in predicting the beamforming filter coefficients. The proposed system achieves
7.97% absolute gain over baseline systems with no beamforming on CHiME-3 real
evaluation set.Comment: in 2017 IEEE International Conference on Acoustics, Speech and Signal
Processing (ICASSP
Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments
Eliminating the negative effect of non-stationary environmental noise is a
long-standing research topic for automatic speech recognition that stills
remains an important challenge. Data-driven supervised approaches, including
ones based on deep neural networks, have recently emerged as potential
alternatives to traditional unsupervised approaches and with sufficient
training, can alleviate the shortcomings of the unsupervised methods in various
real-life acoustic environments. In this light, we review recently developed,
representative deep learning approaches for tackling non-stationary additive
and convolutional degradation of speech with the aim of providing guidelines
for those involved in the development of environmentally robust speech
recognition systems. We separately discuss single- and multi-channel techniques
developed for the front-end and back-end of speech recognition systems, as well
as joint front-end and back-end training frameworks
- …