1,879 research outputs found
Character-Aware Neural Language Models
We describe a simple neural language model that relies only on
character-level inputs. Predictions are still made at the word-level. Our model
employs a convolutional neural network (CNN) and a highway network over
characters, whose output is given to a long short-term memory (LSTM) recurrent
neural network language model (RNN-LM). On the English Penn Treebank the model
is on par with the existing state-of-the-art despite having 60% fewer
parameters. On languages with rich morphology (Arabic, Czech, French, German,
Spanish, Russian), the model outperforms word-level/morpheme-level LSTM
baselines, again with fewer parameters. The results suggest that on many
languages, character inputs are sufficient for language modeling. Analysis of
word representations obtained from the character composition part of the model
reveals that the model is able to encode, from characters only, both semantic
and orthographic information.Comment: AAAI 201
Recurrent Highway Networks
Many sequential processing tasks require complex nonlinear transition
functions from one step to the next. However, recurrent neural networks with
'deep' transition functions remain difficult to train, even when using Long
Short-Term Memory (LSTM) networks. We introduce a novel theoretical analysis of
recurrent networks based on Gersgorin's circle theorem that illuminates several
modeling and optimization issues and improves our understanding of the LSTM
cell. Based on this analysis we propose Recurrent Highway Networks, which
extend the LSTM architecture to allow step-to-step transition depths larger
than one. Several language modeling experiments demonstrate that the proposed
architecture results in powerful and efficient models. On the Penn Treebank
corpus, solely increasing the transition depth from 1 to 10 improves word-level
perplexity from 90.6 to 65.4 using the same number of parameters. On the larger
Wikipedia datasets for character prediction (text8 and enwik8), RHNs outperform
all previous results and achieve an entropy of 1.27 bits per character.Comment: 12 pages, 6 figures, 3 table
Simple Recurrent Units for Highly Parallelizable Recurrence
Common recurrent neural architectures scale poorly due to the intrinsic
difficulty in parallelizing their state computations. In this work, we propose
the Simple Recurrent Unit (SRU), a light recurrent unit that balances model
capacity and scalability. SRU is designed to provide expressive recurrence,
enable highly parallelized implementation, and comes with careful
initialization to facilitate training of deep models. We demonstrate the
effectiveness of SRU on multiple NLP tasks. SRU achieves 5--9x speed-up over
cuDNN-optimized LSTM on classification and question answering datasets, and
delivers stronger results than LSTM and convolutional models. We also obtain an
average of 0.7 BLEU improvement over the Transformer model on translation by
incorporating SRU into the architecture.Comment: EMNL
Syllable-aware Neural Language Models: A Failure to Beat Character-aware Ones
Syllabification does not seem to improve word-level RNN language modeling
quality when compared to character-based segmentation. However, our best
syllable-aware language model, achieving performance comparable to the
competitive character-aware model, has 18%-33% fewer parameters and is trained
1.2-2.2 times faster.Comment: EMNLP 201
- …