11 research outputs found

    Persistence Pays off: Paying Attention to What the LSTM Gating Mechanism Persists

    Get PDF
    Language Models (LMs) are important components in several Natural Language Processing systems. Recurrent Neural Network LMs composed of LSTM units, especially those augmented with an external memory, have achieved state-of-the-art results. However, these models still struggle to process long sequences which are more likely to contain long-distance dependencies because of information fading and a bias towards more recent information. In this paper we demonstrate an effective mechanism for retrieving information in a memory augmented LSTM LM based on attending to information in memory in proportion to the number of timesteps the LSTM gating mechanism persisted the information

    Character n-gram Embeddings to Improve RNN Language Models

    Full text link
    This paper proposes a novel Recurrent Neural Network (RNN) language model that takes advantage of character information. We focus on character n-grams based on research in the field of word embedding construction (Wieting et al. 2016). Our proposed method constructs word embeddings from character n-gram embeddings and combines them with ordinary word embeddings. We demonstrate that the proposed method achieves the best perplexities on the language modeling datasets: Penn Treebank, WikiText-2, and WikiText-103. Moreover, we conduct experiments on application tasks: machine translation and headline generation. The experimental results indicate that our proposed method also positively affects these tasks.Comment: AAAI 2019 pape

    Calibration, Entropy Rates, and Memory in Language Models

    Full text link
    Building accurate language models that capture meaningful long-term dependencies is a core challenge in natural language processing. Towards this end, we present a calibration-based approach to measure long-term discrepancies between a generative sequence model and the true distribution, and use these discrepancies to improve the model. Empirically, we show that state-of-the-art language models, including LSTMs and Transformers, are \emph{miscalibrated}: the entropy rates of their generations drift dramatically upward over time. We then provide provable methods to mitigate this phenomenon. Furthermore, we show how this calibration-based approach can also be used to measure the amount of memory that language models use for prediction

    Estudio de una Red Neuronal Recurrente Dual aplicada a la generaci贸n de secuencias

    Full text link
    M谩ster Universitario en en Investigaci贸n e Innovaci贸n en Inteligencia Computacional y Sistemas InteractivosSe ha realizado un estudio de la equivalencia entre las redes neuronales recurrentes y los aut贸matas finitos deterministas con el fin de proporcionar un mecanismo de interpretaci贸n para este tipo de redes profundas. En primer lugar, se presenta un an谩lisis emp铆rico de la estabilidad y la capacidad de generalizaci贸n de una red recurrente cuando se inyecta ruido Gaussiano a las neuronas de la capa oculta justo antes de aplicar la funci贸n de activaci贸n. Adem谩s, se demuestra que las redes entrenadas en estas condiciones con lenguajes regulares se comportan como los aut贸matas finitos equivalentes. En segundo lugar, se desarrolla una nueva arquitectura de red recurrente, la red Dual, que mejora la generalizaci贸n y la interpretabilidad utilizando dos rutas diferentes en el procesamiento de la informaci贸n. Por un lado, una capa recurrente se encarga de procesar las dependencias temporales en las secuencias de entrada. Por otro lado, una capa densa combina la salida de la capa recurrente con la informaci贸n presente en la entrada para proporcionar la salida final de la red. En conjunto, la nueva arquitectura Dual admite una interpretaci贸n como una m谩quina de Mealy. Los resultados obtenidos tanto en problemas sint茅ticos como en problemas reales muestran, por un lado, que el reparto de la carga de procesamiento de la informaci贸n simplifica la complejidad de la capa recurrente y, por otro lado, que la inyecci贸n de ruido mejora considerablemente la capacidad de generalizaci贸n de la red y su posible interpretabilidad, consiguiendo mejorar ligeramente el resultado de una LSTM est谩ndar entrenada con los mismos problema
    corecore