25 research outputs found

    Unbiasing Truncated Backpropagation Through Time

    Full text link
    Truncated Backpropagation Through Time (truncated BPTT) is a widespread method for learning recurrent computational graphs. Truncated BPTT keeps the computational benefits of Backpropagation Through Time (BPTT) while relieving the need for a complete backtrack through the whole data sequence at every step. However, truncation favors short-term dependencies: the gradient estimate of truncated BPTT is biased, so that it does not benefit from the convergence guarantees from stochastic gradient theory. We introduce Anticipated Reweighted Truncated Backpropagation (ARTBP), an algorithm that keeps the computational benefits of truncated BPTT, while providing unbiasedness. ARTBP works by using variable truncation lengths together with carefully chosen compensation factors in the backpropagation equation. We check the viability of ARTBP on two tasks. First, a simple synthetic task where careful balancing of temporal dependencies at different scales is needed: truncated BPTT displays unreliable performance, and in worst case scenarios, divergence, while ARTBP converges reliably. Second, on Penn Treebank character-level language modelling, ARTBP slightly outperforms truncated BPTT

    Can recurrent neural networks warp time?

    Get PDF
    International audienceSuccessful recurrent models such as long short-term memories (LSTMs) and gated recurrent units (GRUs) use ad hoc gating mechanisms. Empirically these models have been found to improve the learning of medium to long term temporal dependencies and to help with vanishing gradient issues. We prove that learnable gates in a recurrent model formally provide quasi-invariance to general time transformations in the input data. We recover part of the LSTM architecture from a simple axiomatic approach. This result leads to a new way of initializing gate biases in LSTMs and GRUs. Experimentally , this new chrono initialization is shown to greatly improve learning of long term dependencies, with minimal implementation effort. Recurrent neural networks (e.g. (Jaeger, 2002)) are a standard machine learning tool to model and represent temporal data; mathematically they amount to learning the parameters of a parameterized dynamical system so that its behavior optimizes some criterion, such as the prediction of the next data in a sequence

    Self-conditioned Embedding Diffusion for Text Generation

    Full text link
    Can continuous diffusion models bring the same performance breakthrough on natural language they did for image generation? To circumvent the discrete nature of text data, we can simply project tokens in a continuous space of embeddings, as is standard in language modeling. We propose Self-conditioned Embedding Diffusion, a continuous diffusion mechanism that operates on token embeddings and allows to learn flexible and scalable diffusion models for both conditional and unconditional text generation. Through qualitative and quantitative evaluation, we show that our text diffusion models generate samples comparable with those produced by standard autoregressive language models - while being in theory more efficient on accelerator hardware at inference time. Our work paves the way for scaling up diffusion models for text, similarly to autoregressive models, and for improving performance with recent refinements to continuous diffusion.Comment: 15 page

    Emergent Communication: Generalization and Overfitting in Lewis Games

    Get PDF
    International audienceLewis signaling games are a class of simple communication games for simulating the emergence of language. In these games, two agents must agree on a communication protocol in order to solve a cooperative task. Previous work has shown that agents trained to play this game with reinforcement learning tend to develop languages that display undesirable properties from a linguistic point of view (lack of generalization, lack of compositionality, etc). In this paper, we aim to provide better understanding of this phenomenon by analytically studying the learning problem in Lewis games. As a core contribution, we demonstrate that the standard objective in Lewis games can be decomposed in two components: a co-adaptation loss and an information loss. This decomposition enables us to surface two potential sources of overfitting, which we show may undermine the emergence of a structured communication protocol. In particular, when we control for overfitting on the co-adaptation loss, we recover desired properties in the emergent languages: they are more compositional and generalize better

    RĂ©seaux RĂ©currents et Apprentissage par Renforcement: Approches Dynamiques

    Get PDF
    An intelligent agent immerged in its environment must be able to both understand andinteract with the world. Understanding the environment requires processing sequencesof sensorial inputs. Interacting with the environment typically involves issuing actions,and adapting those actions to strive towards a given goal, or to maximize a notion ofreward. This view of a two parts agent-environment interaction motivates the two partsof this thesis: recurrent neural networks are powerful tools to make sense of complexand diverse sequences of inputs, such as those resulting from an agent-environmentinteraction; reinforcement learning is the field of choice to direct the behavior of anagent towards a goal. This thesis aim is to provide theoretical and practical insights inthose two domains. In the field of recurrent networks, this thesis contribution is twofold:we introduce two new, theoretically grounded and scalable learning algorithms that canbe used online. Besides, we advance understanding of gated recurrent networks, byexamining their invariance properties. In the field of reinforcement learning, our maincontribution is to provide guidelines to design time discretization robust algorithms. Allthese contributions are theoretically grounded, and backed up by experimental results.D’un agent intelligent plongé dans le monde, nous attendons à la fois qu’il comprenne,et interagisse avec son environement. La compréhension du monde environnant requierttypiquement l’assimilation de séquences de stimulations sensorielles diverses. Interagiravec l’environnement requiert d’être capable d’adapter son comportement dans le butd’atteindre un objectif fixé, ou de maximiser une notion de récompense. Cette visionbipartite de l’interaction agent-environnement motive les deux parties de cette thèse :les réseaux de neurone récurrents sont des outils puissants pour traiter des signaux mul-timodaux, comme ceux résultants de l’interaction d’un agent avec son environnement, etl’apprentissage par renforcement et le domaine privilégié pour orienter le comportementd’un agent en direction d’un but. Cette thèse a pour but d’apporter des contributionsthéoriques et pratiques dans ces deux champs. Dans le domaine des réseaux récurrents,les contributions de cette thèse sont doubles : nous introduisons deux nouveaux algo-rithmes d’apprentissage de réseaux récurrents en ligne, théoriquement fondés, et passantà l’échelle. Par ailleurs, nous approfondissons les connaissances sur les réseaux récurrentsà portes, en analysant leurs propriétés d’invariance. Dans le domaine de l’apprentissagepar renforcement, notre contribution principale est de proposer une méthode pour robus-tifier les algorithmes existant par rapport à la discrétisation temporelle. Toutes ces con-tributions sont motivées théoriquements, et soutenues par des éléments expérimentaux
    corecore