6,195 research outputs found
Self-Attention Transducers for End-to-End Speech Recognition
Recurrent neural network transducers (RNN-T) have been successfully applied
in end-to-end speech recognition. However, the recurrent structure makes it
difficult for parallelization . In this paper, we propose a self-attention
transducer (SA-T) for speech recognition. RNNs are replaced with self-attention
blocks, which are powerful to model long-term dependencies inside sequences and
able to be efficiently parallelized. Furthermore, a path-aware regularization
is proposed to assist SA-T to learn alignments and improve the performance.
Additionally, a chunk-flow mechanism is utilized to achieve online decoding.
All experiments are conducted on a Mandarin Chinese dataset AISHELL-1. The
results demonstrate that our proposed approach achieves a 21.3% relative
reduction in character error rate compared with the baseline RNN-T. In
addition, the SA-T with chunk-flow mechanism can perform online decoding with
only a little degradation of the performance
Deep Contextualized Acoustic Representations For Semi-Supervised Speech Recognition
We propose a novel approach to semi-supervised automatic speech recognition
(ASR). We first exploit a large amount of unlabeled audio data via
representation learning, where we reconstruct a temporal slice of filterbank
features from past and future context frames. The resulting deep contextualized
acoustic representations (DeCoAR) are then used to train a CTC-based end-to-end
ASR system using a smaller amount of labeled audio data. In our experiments, we
show that systems trained on DeCoAR consistently outperform ones trained on
conventional filterbank features, giving 42% and 19% relative improvement over
the baseline on WSJ eval92 and LibriSpeech test-clean, respectively. Our
approach can drastically reduce the amount of labeled data required;
unsupervised training on LibriSpeech then supervision with 100 hours of labeled
data achieves performance on par with training on all 960 hours directly.
Pre-trained models and code will be released online.Comment: Accepted to ICASSP 2020 (oral
- …