11 research outputs found
Persistence Pays off: Paying Attention to What the LSTM Gating Mechanism Persists
Language Models (LMs) are important components in several Natural Language Processing systems. Recurrent Neural Network LMs composed of LSTM units, especially those augmented with an external memory, have achieved state-of-the-art results. However, these models still struggle to process long sequences which are more likely to contain long-distance dependencies because of information fading and a bias towards more recent information. In this paper we demonstrate an effective mechanism for retrieving information in a memory augmented LSTM LM based on attending to information in memory in proportion to the number of timesteps the LSTM gating mechanism persisted the information
Character n-gram Embeddings to Improve RNN Language Models
This paper proposes a novel Recurrent Neural Network (RNN) language model
that takes advantage of character information. We focus on character n-grams
based on research in the field of word embedding construction (Wieting et al.
2016). Our proposed method constructs word embeddings from character n-gram
embeddings and combines them with ordinary word embeddings. We demonstrate that
the proposed method achieves the best perplexities on the language modeling
datasets: Penn Treebank, WikiText-2, and WikiText-103. Moreover, we conduct
experiments on application tasks: machine translation and headline generation.
The experimental results indicate that our proposed method also positively
affects these tasks.Comment: AAAI 2019 pape
Calibration, Entropy Rates, and Memory in Language Models
Building accurate language models that capture meaningful long-term
dependencies is a core challenge in natural language processing. Towards this
end, we present a calibration-based approach to measure long-term discrepancies
between a generative sequence model and the true distribution, and use these
discrepancies to improve the model. Empirically, we show that state-of-the-art
language models, including LSTMs and Transformers, are \emph{miscalibrated}:
the entropy rates of their generations drift dramatically upward over time. We
then provide provable methods to mitigate this phenomenon. Furthermore, we show
how this calibration-based approach can also be used to measure the amount of
memory that language models use for prediction
Estudio de una Red Neuronal Recurrente Dual aplicada a la generaci贸n de secuencias
M谩ster Universitario en en Investigaci贸n e Innovaci贸n en Inteligencia Computacional y Sistemas InteractivosSe ha realizado un estudio de la equivalencia entre las redes neuronales recurrentes y los aut贸matas
finitos deterministas con el fin de proporcionar un mecanismo de interpretaci贸n para este tipo
de redes profundas. En primer lugar, se presenta un an谩lisis emp铆rico de la estabilidad y la capacidad
de generalizaci贸n de una red recurrente cuando se inyecta ruido Gaussiano a las neuronas de la capa
oculta justo antes de aplicar la funci贸n de activaci贸n. Adem谩s, se demuestra que las redes entrenadas
en estas condiciones con lenguajes regulares se comportan como los aut贸matas finitos equivalentes.
En segundo lugar, se desarrolla una nueva arquitectura de red recurrente, la red Dual, que mejora
la generalizaci贸n y la interpretabilidad utilizando dos rutas diferentes en el procesamiento de la informaci贸n.
Por un lado, una capa recurrente se encarga de procesar las dependencias temporales en las
secuencias de entrada. Por otro lado, una capa densa combina la salida de la capa recurrente con la
informaci贸n presente en la entrada para proporcionar la salida final de la red. En conjunto, la nueva arquitectura
Dual admite una interpretaci贸n como una m谩quina de Mealy. Los resultados obtenidos tanto
en problemas sint茅ticos como en problemas reales muestran, por un lado, que el reparto de la carga
de procesamiento de la informaci贸n simplifica la complejidad de la capa recurrente y, por otro lado, que
la inyecci贸n de ruido mejora considerablemente la capacidad de generalizaci贸n de la red y su posible
interpretabilidad, consiguiendo mejorar ligeramente el resultado de una LSTM est谩ndar entrenada con
los mismos problema