1 research outputs found
On Training Recurrent Networks with Truncated Backpropagation Through Time in Speech Recognition
Recurrent neural networks have been the dominant models for many speech and
language processing tasks. However, we understand little about the behavior and
the class of functions recurrent networks can realize. Moreover, the heuristics
used during training complicate the analyses. In this paper, we study recurrent
networks' ability to learn long-term dependency in the context of speech
recognition. We consider two decoding approaches, online and batch decoding,
and show the classes of functions to which the decoding approaches correspond.
We then draw a connection between batch decoding and a popular training
approach for recurrent networks, truncated backpropagation through time.
Changing the decoding approach restricts the amount of past history recurrent
networks can use for prediction, allowing us to analyze their ability to
remember. Empirically, we utilize long-term dependency in subphonetic states,
phonemes, and words, and show how the design decisions, such as the decoding
approach, lookahead, context frames, and consecutive prediction, characterize
the behavior of recurrent networks. Finally, we draw a connection between
Markov processes and vanishing gradients. These results have implications for
studying the long-term dependency in speech data and how these properties are
learned by recurrent networks