111,626 research outputs found
Single stream parallelization of generalized LSTM-like RNNs on a GPU
Recurrent neural networks (RNNs) have shown outstanding performance on
processing sequence data. However, they suffer from long training time, which
demands parallel implementations of the training procedure. Parallelization of
the training algorithms for RNNs are very challenging because internal
recurrent paths form dependencies between two different time frames. In this
paper, we first propose a generalized graph-based RNN structure that covers the
most popular long short-term memory (LSTM) network. Then, we present a
parallelization approach that automatically explores parallelisms of arbitrary
RNNs by analyzing the graph structure. The experimental results show that the
proposed approach shows great speed-up even with a single training stream, and
further accelerates the training when combined with multiple parallel training
streams.Comment: Accepted by the 40th IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP) 201
Echo State Networks with Self-Normalizing Activations on the Hyper-Sphere
Among the various architectures of Recurrent Neural Networks, Echo State
Networks (ESNs) emerged due to their simplified and inexpensive training
procedure. These networks are known to be sensitive to the setting of
hyper-parameters, which critically affect their behaviour. Results show that
their performance is usually maximized in a narrow region of hyper-parameter
space called edge of chaos. Finding such a region requires searching in
hyper-parameter space in a sensible way: hyper-parameter configurations
marginally outside such a region might yield networks exhibiting fully
developed chaos, hence producing unreliable computations. The performance gain
due to optimizing hyper-parameters can be studied by considering the
memory--nonlinearity trade-off, i.e., the fact that increasing the nonlinear
behavior of the network degrades its ability to remember past inputs, and
vice-versa. In this paper, we propose a model of ESNs that eliminates critical
dependence on hyper-parameters, resulting in networks that provably cannot
enter a chaotic regime and, at the same time, denotes nonlinear behaviour in
phase space characterised by a large memory of past inputs, comparable to the
one of linear networks. Our contribution is supported by experiments
corroborating our theoretical findings, showing that the proposed model
displays dynamics that are rich-enough to approximate many common nonlinear
systems used for benchmarking
Effect of Architectures and Training Methods on the Performance of Learned Video Frame Prediction
We analyze the performance of feedforward vs. recurrent neural network (RNN)
architectures and associated training methods for learned frame prediction. To
this effect, we trained a residual fully convolutional neural network (FCNN), a
convolutional RNN (CRNN), and a convolutional long short-term memory (CLSTM)
network for next frame prediction using the mean square loss. We performed both
stateless and stateful training for recurrent networks. Experimental results
show that the residual FCNN architecture performs the best in terms of peak
signal to noise ratio (PSNR) at the expense of higher training and test
(inference) computational complexity. The CRNN can be trained stably and very
efficiently using the stateful truncated backpropagation through time
procedure, and it requires an order of magnitude less inference runtime to
achieve near real-time frame prediction with an acceptable performance.Comment: Accepted for publication at IEEE ICIP 201
- …