25 research outputs found
Unbiasing Truncated Backpropagation Through Time
Truncated Backpropagation Through Time (truncated BPTT) is a widespread
method for learning recurrent computational graphs. Truncated BPTT keeps the
computational benefits of Backpropagation Through Time (BPTT) while relieving
the need for a complete backtrack through the whole data sequence at every
step. However, truncation favors short-term dependencies: the gradient estimate
of truncated BPTT is biased, so that it does not benefit from the convergence
guarantees from stochastic gradient theory. We introduce Anticipated Reweighted
Truncated Backpropagation (ARTBP), an algorithm that keeps the computational
benefits of truncated BPTT, while providing unbiasedness. ARTBP works by using
variable truncation lengths together with carefully chosen compensation factors
in the backpropagation equation. We check the viability of ARTBP on two tasks.
First, a simple synthetic task where careful balancing of temporal dependencies
at different scales is needed: truncated BPTT displays unreliable performance,
and in worst case scenarios, divergence, while ARTBP converges reliably.
Second, on Penn Treebank character-level language modelling, ARTBP slightly
outperforms truncated BPTT
Can recurrent neural networks warp time?
International audienceSuccessful recurrent models such as long short-term memories (LSTMs) and gated recurrent units (GRUs) use ad hoc gating mechanisms. Empirically these models have been found to improve the learning of medium to long term temporal dependencies and to help with vanishing gradient issues. We prove that learnable gates in a recurrent model formally provide quasi-invariance to general time transformations in the input data. We recover part of the LSTM architecture from a simple axiomatic approach. This result leads to a new way of initializing gate biases in LSTMs and GRUs. Experimentally , this new chrono initialization is shown to greatly improve learning of long term dependencies, with minimal implementation effort. Recurrent neural networks (e.g. (Jaeger, 2002)) are a standard machine learning tool to model and represent temporal data; mathematically they amount to learning the parameters of a parameterized dynamical system so that its behavior optimizes some criterion, such as the prediction of the next data in a sequence
Self-conditioned Embedding Diffusion for Text Generation
Can continuous diffusion models bring the same performance breakthrough on
natural language they did for image generation? To circumvent the discrete
nature of text data, we can simply project tokens in a continuous space of
embeddings, as is standard in language modeling. We propose Self-conditioned
Embedding Diffusion, a continuous diffusion mechanism that operates on token
embeddings and allows to learn flexible and scalable diffusion models for both
conditional and unconditional text generation. Through qualitative and
quantitative evaluation, we show that our text diffusion models generate
samples comparable with those produced by standard autoregressive language
models - while being in theory more efficient on accelerator hardware at
inference time. Our work paves the way for scaling up diffusion models for
text, similarly to autoregressive models, and for improving performance with
recent refinements to continuous diffusion.Comment: 15 page
Emergent Communication: Generalization and Overfitting in Lewis Games
International audienceLewis signaling games are a class of simple communication games for simulating the emergence of language. In these games, two agents must agree on a communication protocol in order to solve a cooperative task. Previous work has shown that agents trained to play this game with reinforcement learning tend to develop languages that display undesirable properties from a linguistic point of view (lack of generalization, lack of compositionality, etc). In this paper, we aim to provide better understanding of this phenomenon by analytically studying the learning problem in Lewis games. As a core contribution, we demonstrate that the standard objective in Lewis games can be decomposed in two components: a co-adaptation loss and an information loss. This decomposition enables us to surface two potential sources of overfitting, which we show may undermine the emergence of a structured communication protocol. In particular, when we control for overfitting on the co-adaptation loss, we recover desired properties in the emergent languages: they are more compositional and generalize better
RĂ©seaux RĂ©currents et Apprentissage par Renforcement: Approches Dynamiques
An intelligent agent immerged in its environment must be able to both understand andinteract with the world. Understanding the environment requires processing sequencesof sensorial inputs. Interacting with the environment typically involves issuing actions,and adapting those actions to strive towards a given goal, or to maximize a notion ofreward. This view of a two parts agent-environment interaction motivates the two partsof this thesis: recurrent neural networks are powerful tools to make sense of complexand diverse sequences of inputs, such as those resulting from an agent-environmentinteraction; reinforcement learning is the field of choice to direct the behavior of anagent towards a goal. This thesis aim is to provide theoretical and practical insights inthose two domains. In the field of recurrent networks, this thesis contribution is twofold:we introduce two new, theoretically grounded and scalable learning algorithms that canbe used online. Besides, we advance understanding of gated recurrent networks, byexamining their invariance properties. In the field of reinforcement learning, our maincontribution is to provide guidelines to design time discretization robust algorithms. Allthese contributions are theoretically grounded, and backed up by experimental results.Dâun agent intelligent plongeÌ dans le monde, nous attendons aÌ la fois quâil comprenne,et interagisse avec son environement. La compreÌhension du monde environnant requierttypiquement lâassimilation de seÌquences de stimulations sensorielles diverses. Interagiravec lâenvironnement requiert dâeÌtre capable dâadapter son comportement dans le butdâatteindre un objectif fixeÌ, ou de maximiser une notion de reÌcompense. Cette visionbipartite de lâinteraction agent-environnement motive les deux parties de cette theÌse :les reÌseaux de neurone reÌcurrents sont des outils puissants pour traiter des signaux mul-timodaux, comme ceux reÌsultants de lâinteraction dâun agent avec son environnement, etlâapprentissage par renforcement et le domaine privileÌgieÌ pour orienter le comportementdâun agent en direction dâun but. Cette theÌse a pour but dâapporter des contributionstheÌoriques et pratiques dans ces deux champs. Dans le domaine des reÌseaux reÌcurrents,les contributions de cette theÌse sont doubles : nous introduisons deux nouveaux algo-rithmes dâapprentissage de reÌseaux reÌcurrents en ligne, theÌoriquement fondeÌs, et passantaÌ lâeÌchelle. Par ailleurs, nous approfondissons les connaissances sur les reÌseaux reÌcurrentsaÌ portes, en analysant leurs proprieÌteÌs dâinvariance. Dans le domaine de lâapprentissagepar renforcement, notre contribution principale est de proposer une meÌthode pour robus-tifier les algorithmes existant par rapport aÌ la discreÌtisation temporelle. Toutes ces con-tributions sont motiveÌes theÌoriquements, et soutenues par des eÌleÌments expeÌrimentaux