33 research outputs found
Character-Level Language Modeling with Deeper Self-Attention
LSTMs and other RNN variants have shown strong performance on character-level
language modeling. These models are typically trained using truncated
backpropagation through time, and it is common to assume that their success
stems from their ability to remember long-term contexts. In this paper, we show
that a deep (64-layer) transformer model with fixed context outperforms RNN
variants by a large margin, achieving state of the art on two popular
benchmarks: 1.13 bits per character on text8 and 1.06 on enwik8. To get good
results at this depth, we show that it is important to add auxiliary losses,
both at intermediate network layers and intermediate sequence positions.Comment: 8 pages, 7 figure
Long Short-Term Memory with Dynamic Skip Connections
In recent years, long short-term memory (LSTM) has been successfully used to
model sequential data of variable length. However, LSTM can still experience
difficulty in capturing long-term dependencies. In this work, we tried to
alleviate this problem by introducing a dynamic skip connection, which can
learn to directly connect two dependent words. Since there is no dependency
information in the training data, we propose a novel reinforcement
learning-based method to model the dependency relationship and connect
dependent words. The proposed model computes the recurrent transition functions
based on the skip connections, which provides a dynamic skipping advantage over
RNNs that always tackle entire sentences sequentially. Our experimental results
on three natural language processing tasks demonstrate that the proposed method
can achieve better performance than existing methods. In the number prediction
experiment, the proposed model outperformed LSTM with respect to accuracy by
nearly 20%