Despite the remarkable progress achieved on automatic speech recognition,
recognizing far-field speeches mixed with various noise sources is still a
challenging task. In this paper, we introduce novel student-teacher transfer
learning, BridgeNet which can provide a solution to improve distant speech
recognition. There are two key features in BridgeNet. First, BridgeNet extends
traditional student-teacher frameworks by providing multiple hints from a
teacher network. Hints are not limited to the soft labels from a teacher
network. Teacher's intermediate feature representations can better guide a
student network to learn how to denoise or dereverberate noisy input. Second,
the proposed recursive architecture in the BridgeNet can iteratively improve
denoising and recognition performance. The experimental results of BridgeNet
showed significant improvements in tackling the distant speech recognition
problem, where it achieved up to 13.24% relative WER reductions on AMI corpus
compared to a baseline neural network without teacher's hints.Comment: Accepted to 2018 IEEE International Conference on Acoustics, Speech
and Signal Processing (ICASSP 2018