Search CORE

1,996 research outputs found

A Study of the Recurrent Neural Network Encoder-Decoder for Large Vocabulary Speech Recognition

Author: Cho Kyunghyun
Lu Liang
Renals Stephen
Zhang Xingxing
Publication venue
Publication date: 01/09/2015
Field of study

Deep neural networks have advanced the state-of-the-art in automatic speech recognition, when combined with hidden Markov models (HMMs). Recently there has been interest in using systems based on recurrent neural networks (RNNs) to perform sequence modelling directly, without the require-ment of an HMM superstructure. In this paper, we study the RNN encoder-decoder approach for large vocabulary end-to-end speech recognition, whereby an encoder transforms a se-quence of acoustic vectors into a sequence of feature represen-tations, from which a decoder recovers a sequence of words. We investigated this approach on the Switchboard corpus us-ing a training set of around 300 hours of transcribed audio data. Without the use of an explicit language model or pronunciation lexicon, we achieved promising recognition accuracy, demon-strating that this approach warrants further investigation. Index Terms: end-to-end speech recognition, deep neural net-works, recurrent neural networks, encoder-decoder. 1

CiteSeerX