7,329 research outputs found
Single stream parallelization of generalized LSTM-like RNNs on a GPU
Recurrent neural networks (RNNs) have shown outstanding performance on
processing sequence data. However, they suffer from long training time, which
demands parallel implementations of the training procedure. Parallelization of
the training algorithms for RNNs are very challenging because internal
recurrent paths form dependencies between two different time frames. In this
paper, we first propose a generalized graph-based RNN structure that covers the
most popular long short-term memory (LSTM) network. Then, we present a
parallelization approach that automatically explores parallelisms of arbitrary
RNNs by analyzing the graph structure. The experimental results show that the
proposed approach shows great speed-up even with a single training stream, and
further accelerates the training when combined with multiple parallel training
streams.Comment: Accepted by the 40th IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP) 201
Incremental construction of LSTM recurrent neural network
Long Short--Term Memory (LSTM) is a recurrent neural network that
uses structures called memory blocks to allow the net remember
significant events distant in the past input sequence in order to
solve long time lag tasks, where other RNN approaches fail.
Throughout this work we have performed experiments using LSTM
networks extended with growing abilities, which we call GLSTM.
Four methods of training growing LSTM has been compared. These
methods include cascade and fully connected hidden layers as well
as two different levels of freezing previous weights in the
cascade case. GLSTM has been applied to a forecasting problem in a biomedical domain, where the input/output behavior of five
controllers of the Central Nervous System control has to be
modelled. We have compared growing LSTM results against other
neural networks approaches, and our work applying conventional
LSTM to the task at hand.Postprint (published version
Linear Memory Networks
Recurrent neural networks can learn complex transduction problems that
require maintaining and actively exploiting a memory of their inputs. Such
models traditionally consider memory and input-output functionalities
indissolubly entangled. We introduce a novel recurrent architecture based on
the conceptual separation between the functional input-output transformation
and the memory mechanism, showing how they can be implemented through different
neural components. By building on such conceptualization, we introduce the
Linear Memory Network, a recurrent model comprising a feedforward neural
network, realizing the non-linear functional transformation, and a linear
autoencoder for sequences, implementing the memory component. The resulting
architecture can be efficiently trained by building on closed-form solutions to
linear optimization problems. Further, by exploiting equivalence results
between feedforward and recurrent neural networks we devise a pretraining
schema for the proposed architecture. Experiments on polyphonic music datasets
show competitive results against gated recurrent networks and other state of
the art models
- …