Search CORE

11 research outputs found

Persistence Pays off: Paying Attention to What the LSTM Gating Mechanism Persists

Author: Kelleher John D.
Salton Giancarlo
Publication venue: Dublin Institute of Technology
Publication date: 01/01/2019
Field of study

Language Models (LMs) are important components in several Natural Language Processing systems. Recurrent Neural Network LMs composed of LSTM units, especially those augmented with an external memory, have achieved state-of-the-art results. However, these models still struggle to process long sequences which are more likely to contain long-distance dependencies because of information fading and a bias towards more recent information. In this paper we demonstrate an effective mechanism for retrieving information in a memory augmented LSTM LM based on attending to information in memory in proportion to the number of timesteps the LSTM gating mechanism persisted the information

Arrow@TUDublin

Character n-gram Embeddings to Improve RNN Language Models

Author: Nagata Masaaki
Suzuki Jun
Takase Sho
Publication venue
Publication date: 13/06/2019
Field of study

This paper proposes a novel Recurrent Neural Network (RNN) language model that takes advantage of character information. We focus on character n-grams based on research in the field of word embedding construction (Wieting et al. 2016). Our proposed method constructs word embeddings from character n-gram embeddings and combines them with ordinary word embeddings. We demonstrate that the proposed method achieves the best perplexities on the language modeling datasets: Penn Treebank, WikiText-2, and WikiText-103. Moreover, we conduct experiments on application tasks: machine translation and headline generation. The experimental results indicate that our proposed method also positively affects these tasks.Comment: AAAI 2019 pape

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Calibration, Entropy Rates, and Memory in Language Models

Author: Braverman Mark
Chen Xinyi
Kakade Sham M.
Narasimhan Karthik
Zhang Cyril
Zhang Yi
Publication venue
Publication date: 01/06/2019
Field of study

Building accurate language models that capture meaningful long-term dependencies is a core challenge in natural language processing. Towards this end, we present a calibration-based approach to measure long-term discrepancies between a generative sequence model and the true distribution, and use these discrepancies to improve the model. Empirically, we show that state-of-the-art language models, including LSTMs and Transformers, are \emph{miscalibrated}: the entropy rates of their generations drift dramatically upward over time. We then provide provable methods to mitigate this phenomenon. Furthermore, we show how this calibration-based approach can also be used to measure the amount of memory that language models use for prediction

arXiv.org e-Print Archive

Princeton University Open Access Repository

Estudio de una Red Neuronal Recurrente Dual aplicada a la generación de secuencias

Author: Oliva Moya Christian
Publication venue
Publication date: 01/07/2020
Field of study

Máster Universitario en en Investigación e Innovación en Inteligencia Computacional y Sistemas InteractivosSe ha realizado un estudio de la equivalencia entre las redes neuronales recurrentes y los autómatas finitos deterministas con el fin de proporcionar un mecanismo de interpretación para este tipo de redes profundas. En primer lugar, se presenta un análisis empírico de la estabilidad y la capacidad de generalización de una red recurrente cuando se inyecta ruido Gaussiano a las neuronas de la capa oculta justo antes de aplicar la función de activación. Además, se demuestra que las redes entrenadas en estas condiciones con lenguajes regulares se comportan como los autómatas finitos equivalentes. En segundo lugar, se desarrolla una nueva arquitectura de red recurrente, la red Dual, que mejora la generalización y la interpretabilidad utilizando dos rutas diferentes en el procesamiento de la información. Por un lado, una capa recurrente se encarga de procesar las dependencias temporales en las secuencias de entrada. Por otro lado, una capa densa combina la salida de la capa recurrente con la información presente en la entrada para proporcionar la salida final de la red. En conjunto, la nueva arquitectura Dual admite una interpretación como una máquina de Mealy. Los resultados obtenidos tanto en problemas sintéticos como en problemas reales muestran, por un lado, que el reparto de la carga de procesamiento de la información simplifica la complejidad de la capa recurrente y, por otro lado, que la inyección de ruido mejora considerablemente la capacidad de generalización de la red y su posible interpretabilidad, consiguiendo mejorar ligeramente el resultado de una LSTM estándar entrenada con los mismos problema

Biblos-e Archivo