270 research outputs found
Sequence Transduction with Recurrent Neural Networks
Many machine learning tasks can be expressed as the transformation---or
\emph{transduction}---of input sequences into output sequences: speech
recognition, machine translation, protein secondary structure prediction and
text-to-speech to name but a few. One of the key challenges in sequence
transduction is learning to represent both the input and output sequences in a
way that is invariant to sequential distortions such as shrinking, stretching
and translating. Recurrent neural networks (RNNs) are a powerful sequence
learning architecture that has proven capable of learning such representations.
However RNNs traditionally require a pre-defined alignment between the input
and output sequences to perform transduction. This is a severe limitation since
\emph{finding} the alignment is the most difficult aspect of many sequence
transduction problems. Indeed, even determining the length of the output
sequence is often challenging. This paper introduces an end-to-end,
probabilistic sequence transduction system, based entirely on RNNs, that is in
principle able to transform any input sequence into any finite, discrete output
sequence. Experimental results for phoneme recognition are provided on the
TIMIT speech corpus.Comment: First published in the International Conference of Machine Learning
(ICML) 2012 Workshop on Representation Learnin
Recurrent Models of Visual Attention
Applying convolutional neural networks to large images is computationally
expensive because the amount of computation scales linearly with the number of
image pixels. We present a novel recurrent neural network model that is capable
of extracting information from an image or video by adaptively selecting a
sequence of regions or locations and only processing the selected regions at
high resolution. Like convolutional neural networks, the proposed model has a
degree of translation invariance built-in, but the amount of computation it
performs can be controlled independently of the input image size. While the
model is non-differentiable, it can be trained using reinforcement learning
methods to learn task-specific policies. We evaluate our model on several image
classification tasks, where it significantly outperforms a convolutional neural
network baseline on cluttered images, and on a dynamic visual control problem,
where it learns to track a simple object without an explicit training signal
for doing so
- …