303 research outputs found
Learning Multi-Object Positional Relationships via Emergent Communication
The study of emergent communication has been dedicated to interactive
artificial intelligence. While existing work focuses on communication about
single objects or complex image scenes, we argue that communicating
relationships between multiple objects is important in more realistic tasks,
but understudied. In this paper, we try to fill this gap and focus on emergent
communication about positional relationships between two objects. We train
agents in the referential game where observations contain two objects, and find
that generalization is the major problem when the positional relationship is
involved. The key factor affecting the generalization ability of the emergent
language is the input variation between Speaker and Listener, which is realized
by a random image generator in our work. Further, we find that the learned
language can generalize well in a new multi-step MDP task where the positional
relationship describes the goal, and performs better than raw-pixel images as
well as pre-trained image features, verifying the strong generalization ability
of discrete sequences. We also show that language transfer from the referential
game performs better in the new task than learning language directly in this
task, implying the potential benefits of pre-training in referential games. All
in all, our experiments demonstrate the viability and merit of having agents
learn to communicate positional relationships between multiple objects through
emergent communication.Comment: 15 page
ETHER: Aligning Emergent Communication for Hindsight Experience Replay
Natural language instruction following is paramount to enable collaboration
between artificial agents and human beings. Natural language-conditioned
reinforcement learning (RL) agents have shown how natural languages'
properties, such as compositionality, can provide a strong inductive bias to
learn complex policies. Previous architectures like HIGhER combine the benefit
of language-conditioning with Hindsight Experience Replay (HER) to deal with
sparse rewards environments. Yet, like HER, HIGhER relies on an oracle
predicate function to provide a feedback signal highlighting which linguistic
description is valid for which state. This reliance on an oracle limits its
application. Additionally, HIGhER only leverages the linguistic information
contained in successful RL trajectories, thus hurting its final performance and
data-efficiency. Without early successful trajectories, HIGhER is no better
than DQN upon which it is built. In this paper, we propose the Emergent Textual
Hindsight Experience Replay (ETHER) agent, which builds on HIGhER and addresses
both of its limitations by means of (i) a discriminative visual referential
game, commonly studied in the subfield of Emergent Communication (EC), used
here as an unsupervised auxiliary task and (ii) a semantic grounding scheme to
align the emergent language with the natural language of the
instruction-following benchmark. We show that the referential game's agents
make an artificial language emerge that is aligned with the natural-like
language used to describe goals in the BabyAI benchmark and that it is
expressive enough so as to also describe unsuccessful RL trajectories and thus
provide feedback to the RL agent to leverage the linguistic, structured
information contained in all trajectories. Our work shows that EC is a viable
unsupervised auxiliary task for RL and provides missing pieces to make HER more
widely applicable.Comment: work in progres
On iterated learning for task-oriented dialogue
Dans le traitement de langue et des système de dialogue, il est courant de pré-entraîner des modèles de langue sur corpus humain avant de les affiner par le biais d'un simulateur et de résolution de tâches. Malheuresement, ce type d'entrainement tend aussi à induire un phénomène connu sous le nom de dérive du langage. Concrétement, les propriétés syntaxiques et sémantiques de la langue intiallement apprise se détériorent: les agents se concentrent uniquement sur la résolution de la tâche, et non plus sur la préservation de la langue. En s'inspirant des travaux en sciences cognitives, et notamment l'apprentigssage itératif Kirby and Griffiths (2014), nous proposons ici une approche générique pour contrer cette dérive du langage. Nous avons appelé cette méthode Seeded iterated learning (SIL), ou apprentissage itératif capitalisé. Ce travail a été publié sous le titre (Lu et al., 2020b) et est présenté au chapitre 2. Afin d'émuler la transmission de la langue entre chaque génération d'agents, un agent étudiant est d'abord pré-entrainé avant d'être affiné de manière itérative, et ceci, en imitant des données échantillonnées à partir d'un agent enseignant nouvellement formé. À chaque génération, l'enseignant est créé en copiant l'agent étudiant, avant d'être de nouveau affiné en maximisant le taux de réussite de la tâche sous-jacente. Dans un second temps, nous présentons Supervised Seeded iterated learning (SSIL) dans le chapitre 3, où apprentissage itératif capitalisé avec supervision, qui a été publié sous le titre (Lu et al., 2020b). SSIL s'appuie sur SIL en le combinant avec une autre méthode populaire appelée Supervised SelfPlay (S2P) (Gupta et al., 2019), où apprentissage supervisé par auto-jeu. SSIL est capable d'atténuer les problèmes de S2P et de SIL, i.e. la dérive du langage dans les dernier stades de l'entrainement tout en préservant une plus grande diversité linguistique.
Tout d'abord, nous évaluons nos méthodes dans sous la forme d'une preuve de concept à traver le Jeu de Lewis avec du langage synthetique. Dans un second temps, nous l'étendons à un jeu de traduction se utilisant du langage naturel. Dans les deux cas, nous soulignons l'efficacité de nos méthodes par rapport aux autres méthodes de la litterature.
Dans le chapitre 1, nous discutons des concepts de base nécessaires à la compréhension des articles présentés dans les chapitres 2 et 3. Nous décrivons le problème spécifique du dialogue orienté tâche, y compris les approches actuelles et les défis auxquels ils sont confrontés : en particulier, la dérive linguistique. Nous donnons également un aperçu du cadre d'apprentissage itéré. Certaines sections du chapitre 1 sont empruntées aux articles pour des raisons de cohérence et de facilité de compréhension. Le chapitre 2 comprend les travaux publiés sous le nom de (Lu et al., 2020b) et le chapitre 3 comprend les travaux publiés sous le nom de (Lu et al., 2020a), avant de conclure au chapitre 4.In task-oriented dialogue, pretraining on human corpus followed by finetuning in a
simulator using selfplay suffers from a phenomenon called language drift. The syntactic
and semantic properties of the learned language deteriorates as the agents only focuses
on solving the task. Inspired by the iterative learning framework in cognitive science
Kirby and Griffiths (2014), we propose a generic approach to counter language drift called
Seeded iterated learning (SIL). This work was published as (Lu et al., 2020b) and is
presented in Chapter 2. In an attempt to emulate transmission of language between generations,
a pretrained student agent is iteratively refined by imitating data sampled from
a newly trained teacher agent. At each generation, the teacher is created by copying the
student agent, before being finetuned to maximize task completion.We further introduce
Supervised Seeded iterated learning (SSIL) in Chapter 3, work which was published as
(Lu et al., 2020a). SSIL builds upon SIL by combining it with the other popular method
called Supervised SelfPlay (S2P) (Gupta et al., 2019). SSIL is able to mitigate the
problems of both S2P and SIL namely late-stage training collapse and low language diversity.
We evaluate our methods in a toy setting of Lewis Game, and then scale it up to
the translation game with natural language. In both settings, we highlight the efficacy of
our methods compared to the baselines.
In Chapter 1, we talk about the core concepts required for understanding the papers presented
in Chapters 2 and 3. We describe the specific problem of task-oriented dialogue
including current approaches and the challenges they face: particularly, the challenge
of language drift. We also give an overview of the iterated learning framework. Some
sections in Chapter 1 are borrowed from the papers for coherence and ease of understanding.
Chapter 2 comprises of the work published as (Lu et al., 2020b) and Chapter 3
comprises of the work published as (Lu et al., 2020a). Chapter 4 gives a conclusion on
the work
Meta-Referential Games to Learn Compositional Learning Behaviours
Human beings use compositionality to generalise from past experiences to
novel experiences. We assume a separation of our experiences into fundamental
atomic components that can be recombined in novel ways to support our ability
to engage with novel experiences. We frame this as the ability to learn to
generalise compositionally, and we will refer to behaviours making use of this
ability as compositional learning behaviours (CLBs). A central problem to
learning CLBs is the resolution of a binding problem (BP). While it is another
feat of intelligence that human beings perform with ease, it is not the case
for state-of-the-art artificial agents. Thus, in order to build artificial
agents able to collaborate with human beings, we propose to develop a novel
benchmark to investigate agents' abilities to exhibit CLBs by solving a
domain-agnostic version of the BP. We take inspiration from the language
emergence and grounding framework of referential games and propose a
meta-learning extension of referential games, entitled Meta-Referential Games,
and use this framework to build our benchmark, that we name Symbolic Behaviour
Benchmark (S2B). We provide baseline results showing that our benchmark is a
compelling challenge that we hope will spur the research community towards
developing more capable artificial agents.Comment: work in progres
On (Emergent) Systematic Generalisation and Compositionality in Visual Referential Games with Straight-Through Gumbel-Softmax Estimator
The drivers of compositionality in artificial languages that emerge when two
(or more) agents play a non-visual referential game has been previously
investigated using approaches based on the REINFORCE algorithm and the (Neural)
Iterated Learning Model. Following the more recent introduction of the
\textit{Straight-Through Gumbel-Softmax} (ST-GS) approach, this paper
investigates to what extent the drivers of compositionality identified so far
in the field apply in the ST-GS context and to what extent do they translate
into (emergent) systematic generalisation abilities, when playing a visual
referential game. Compositionality and the generalisation abilities of the
emergent languages are assessed using topographic similarity and zero-shot
compositional tests. Firstly, we provide evidence that the test-train split
strategy significantly impacts the zero-shot compositional tests when dealing
with visual stimuli, whilst it does not when dealing with symbolic ones.
Secondly, empirical evidence shows that using the ST-GS approach with small
batch sizes and an overcomplete communication channel improves compositionality
in the emerging languages. Nevertheless, while shown robust with symbolic
stimuli, the effect of the batch size is not so clear-cut when dealing with
visual stimuli. Our results also show that not all overcomplete communication
channels are created equal. Indeed, while increasing the maximum sentence
length is found to be beneficial to further both compositionality and
generalisation abilities, increasing the vocabulary size is found detrimental.
Finally, a lack of correlation between the language compositionality at
training-time and the agents' generalisation abilities is observed in the
context of discriminative referential games with visual stimuli. This is
similar to previous observations in the field using the generative variant with
symbolic stimuli.Comment: Accepted at 4th NeurIPS Workshop on Emergent Communication (EmeCom @
NeurIPS 2020
- …