2 research outputs found
Slim LSTM networks: LSTM_6 and LSTM_C6
We have shown previously that our parameter-reduced variants of Long
Short-Term Memory (LSTM) Recurrent Neural Networks (RNN) are comparable in
performance to the standard LSTM RNN on the MNIST dataset. In this study, we
show that this is also the case for two diverse benchmark datasets, namely, the
review sentiment IMDB and the 20 Newsgroup datasets. Specifically, we focus on
two of the simplest variants, namely LSTM_6 (i.e., standard LSTM with three
constant fixed gates) and LSTM_C6 (i.e., LSTM_6 with further reduced cell body
input block). We demonstrate that these two aggressively reduced-parameter
variants are competitive with the standard LSTM when hyper-parameters, e.g.,
learning parameter, number of hidden units and gate constants are set properly.
These architectures enable speeding up training computations and hence, these
networks would be more suitable for online training and inference onto portable
devices with relatively limited computational resources.Comment: 6 pages, 12 figures, 5 table