Search CORE

25 research outputs found

Unbiasing Truncated Backpropagation Through Time

Author: Ollivier Yann
Tallec Corentin
Publication venue
Publication date: 23/05/2017
Field of study

Truncated Backpropagation Through Time (truncated BPTT) is a widespread method for learning recurrent computational graphs. Truncated BPTT keeps the computational benefits of Backpropagation Through Time (BPTT) while relieving the need for a complete backtrack through the whole data sequence at every step. However, truncation favors short-term dependencies: the gradient estimate of truncated BPTT is biased, so that it does not benefit from the convergence guarantees from stochastic gradient theory. We introduce Anticipated Reweighted Truncated Backpropagation (ARTBP), an algorithm that keeps the computational benefits of truncated BPTT, while providing unbiasedness. ARTBP works by using variable truncation lengths together with carefully chosen compensation factors in the backpropagation equation. We check the viability of ARTBP on two tasks. First, a simple synthetic task where careful balancing of temporal dependencies at different scales is needed: truncated BPTT displays unreliable performance, and in worst case scenarios, divergence, while ARTBP converges reliably. Second, on Penn Treebank character-level language modelling, ARTBP slightly outperforms truncated BPTT

arXiv.org e-Print Archive

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

Can recurrent neural networks warp time?

Author: Ollivier Yann
Tallec Corentin
Publication venue: HAL CCSD
Publication date: 30/04/2018
Field of study

International audienceSuccessful recurrent models such as long short-term memories (LSTMs) and gated recurrent units (GRUs) use ad hoc gating mechanisms. Empirically these models have been found to improve the learning of medium to long term temporal dependencies and to help with vanishing gradient issues. We prove that learnable gates in a recurrent model formally provide quasi-invariance to general time transformations in the input data. We recover part of the LSTM architecture from a simple axiomatic approach. This result leads to a new way of initializing gate biases in LSTMs and GRUs. Experimentally , this new chrono initialization is shown to greatly improve learning of long term dependencies, with minimal implementation effort. Recurrent neural networks (e.g. (Jaeger, 2002)) are a standard machine learning tool to model and represent temporal data; mathematically they amount to learning the parameters of a parameterized dynamical system so that its behavior optimizes some criterion, such as the prediction of the next data in a sequence

INRIA a CCSD electronic archive server

Self-conditioned Embedding Diffusion for Text Generation

Author: Altché Florent
Dieleman Sander
Du Yilun
Ganin Yaroslav
Grathwohl Will
Leblond Rémi
Mensch Arthur
Savinov Nikolay
Sifre Laurent
Strudel Robin
Tallec Corentin
Publication venue
Publication date: 08/11/2022
Field of study

Can continuous diffusion models bring the same performance breakthrough on natural language they did for image generation? To circumvent the discrete nature of text data, we can simply project tokens in a continuous space of embeddings, as is standard in language modeling. We propose Self-conditioned Embedding Diffusion, a continuous diffusion mechanism that operates on token embeddings and allows to learn flexible and scalable diffusion models for both conditional and unconditional text generation. Through qualitative and quantitative evaluation, we show that our text diffusion models generate samples comparable with those produced by standard autoregressive language models - while being in theory more efficient on accelerator hardware at inference time. Our work paves the way for scaling up diffusion models for text, similarly to autoregressive models, and for improving performance with recent refinements to continuous diffusion.Comment: 15 page

arXiv.org e-Print Archive

Emergent Communication: Generalization and Overfitting in Lewis Games

Author: Dupoux Emmanuel
Grill Jean-Bastien
Michel Paul
Pietquin Olivier
Rita Mathieu
Strub Florian
Tallec Corentin
Publication venue: HAL CCSD
Publication date: 28/11/2022
Field of study

International audienceLewis signaling games are a class of simple communication games for simulating the emergence of language. In these games, two agents must agree on a communication protocol in order to solve a cooperative task. Previous work has shown that agents trained to play this game with reinforcement learning tend to develop languages that display undesirable properties from a linguistic point of view (lack of generalization, lack of compositionality, etc). In this paper, we aim to provide better understanding of this phenomenon by analytically studying the learning problem in Lewis games. As a core contribution, we demonstrate that the standard objective in Lewis games can be decomposed in two components: a co-adaptation loss and an information loss. This decomposition enables us to surface two potential sources of overfitting, which we show may undermine the emergence of a structured communication protocol. In particular, when we control for overfitting on the co-adaptation loss, we recover desired properties in the emergent languages: they are more compositional and generalize better

INRIA a CCSD electronic archive server

Réseaux Récurrents et Apprentissage par Renforcement: Approches Dynamiques

Author: Tallec Corentin
Publication venue: HAL CCSD
Publication date: 07/10/2019
Field of study

An intelligent agent immerged in its environment must be able to both understand andinteract with the world. Understanding the environment requires processing sequencesof sensorial inputs. Interacting with the environment typically involves issuing actions,and adapting those actions to strive towards a given goal, or to maximize a notion ofreward. This view of a two parts agent-environment interaction motivates the two partsof this thesis: recurrent neural networks are powerful tools to make sense of complexand diverse sequences of inputs, such as those resulting from an agent-environmentinteraction; reinforcement learning is the field of choice to direct the behavior of anagent towards a goal. This thesis aim is to provide theoretical and practical insights inthose two domains. In the field of recurrent networks, this thesis contribution is twofold:we introduce two new, theoretically grounded and scalable learning algorithms that canbe used online. Besides, we advance understanding of gated recurrent networks, byexamining their invariance properties. In the field of reinforcement learning, our maincontribution is to provide guidelines to design time discretization robust algorithms. Allthese contributions are theoretically grounded, and backed up by experimental results.D’un agent intelligent plongé dans le monde, nous attendons à la fois qu’il comprenne,et interagisse avec son environement. La compréhension du monde environnant requierttypiquement l’assimilation de séquences de stimulations sensorielles diverses. Interagiravec l’environnement requiert d’être capable d’adapter son comportement dans le butd’atteindre un objectif fixé, ou de maximiser une notion de récompense. Cette visionbipartite de l’interaction agent-environnement motive les deux parties de cette thèse :les réseaux de neurone récurrents sont des outils puissants pour traiter des signaux mul-timodaux, comme ceux résultants de l’interaction d’un agent avec son environnement, etl’apprentissage par renforcement et le domaine privilégié pour orienter le comportementd’un agent en direction d’un but. Cette thèse a pour but d’apporter des contributionsthéoriques et pratiques dans ces deux champs. Dans le domaine des réseaux récurrents,les contributions de cette thèse sont doubles : nous introduisons deux nouveaux algo-rithmes d’apprentissage de réseaux récurrents en ligne, théoriquement fondés, et passantà l’échelle. Par ailleurs, nous approfondissons les connaissances sur les réseaux récurrentsà portes, en analysant leurs propriétés d’invariance. Dans le domaine de l’apprentissagepar renforcement, notre contribution principale est de proposer une méthode pour robus-tifier les algorithmes existant par rapport à la discrétisation temporelle. Toutes ces con-tributions sont motivées théoriquements, et soutenues par des éléments expérimentaux

HAL-CentraleSupelec

Thèses en Ligne

INRIA a CCSD electronic archive server

HAL-Rennes 1