95 research outputs found
On iterated learning for task-oriented dialogue
Dans le traitement de langue et des système de dialogue, il est courant de pré-entraîner des modèles de langue sur corpus humain avant de les affiner par le biais d'un simulateur et de résolution de tâches. Malheuresement, ce type d'entrainement tend aussi à induire un phénomène connu sous le nom de dérive du langage. Concrétement, les propriétés syntaxiques et sémantiques de la langue intiallement apprise se détériorent: les agents se concentrent uniquement sur la résolution de la tâche, et non plus sur la préservation de la langue. En s'inspirant des travaux en sciences cognitives, et notamment l'apprentigssage itératif Kirby and Griffiths (2014), nous proposons ici une approche générique pour contrer cette dérive du langage. Nous avons appelé cette méthode Seeded iterated learning (SIL), ou apprentissage itératif capitalisé. Ce travail a été publié sous le titre (Lu et al., 2020b) et est présenté au chapitre 2. Afin d'émuler la transmission de la langue entre chaque génération d'agents, un agent étudiant est d'abord pré-entrainé avant d'être affiné de manière itérative, et ceci, en imitant des données échantillonnées à partir d'un agent enseignant nouvellement formé. À chaque génération, l'enseignant est créé en copiant l'agent étudiant, avant d'être de nouveau affiné en maximisant le taux de réussite de la tâche sous-jacente. Dans un second temps, nous présentons Supervised Seeded iterated learning (SSIL) dans le chapitre 3, où apprentissage itératif capitalisé avec supervision, qui a été publié sous le titre (Lu et al., 2020b). SSIL s'appuie sur SIL en le combinant avec une autre méthode populaire appelée Supervised SelfPlay (S2P) (Gupta et al., 2019), où apprentissage supervisé par auto-jeu. SSIL est capable d'atténuer les problèmes de S2P et de SIL, i.e. la dérive du langage dans les dernier stades de l'entrainement tout en préservant une plus grande diversité linguistique.
Tout d'abord, nous évaluons nos méthodes dans sous la forme d'une preuve de concept à traver le Jeu de Lewis avec du langage synthetique. Dans un second temps, nous l'étendons à un jeu de traduction se utilisant du langage naturel. Dans les deux cas, nous soulignons l'efficacité de nos méthodes par rapport aux autres méthodes de la litterature.
Dans le chapitre 1, nous discutons des concepts de base nécessaires à la compréhension des articles présentés dans les chapitres 2 et 3. Nous décrivons le problème spécifique du dialogue orienté tâche, y compris les approches actuelles et les défis auxquels ils sont confrontés : en particulier, la dérive linguistique. Nous donnons également un aperçu du cadre d'apprentissage itéré. Certaines sections du chapitre 1 sont empruntées aux articles pour des raisons de cohérence et de facilité de compréhension. Le chapitre 2 comprend les travaux publiés sous le nom de (Lu et al., 2020b) et le chapitre 3 comprend les travaux publiés sous le nom de (Lu et al., 2020a), avant de conclure au chapitre 4.In task-oriented dialogue, pretraining on human corpus followed by finetuning in a
simulator using selfplay suffers from a phenomenon called language drift. The syntactic
and semantic properties of the learned language deteriorates as the agents only focuses
on solving the task. Inspired by the iterative learning framework in cognitive science
Kirby and Griffiths (2014), we propose a generic approach to counter language drift called
Seeded iterated learning (SIL). This work was published as (Lu et al., 2020b) and is
presented in Chapter 2. In an attempt to emulate transmission of language between generations,
a pretrained student agent is iteratively refined by imitating data sampled from
a newly trained teacher agent. At each generation, the teacher is created by copying the
student agent, before being finetuned to maximize task completion.We further introduce
Supervised Seeded iterated learning (SSIL) in Chapter 3, work which was published as
(Lu et al., 2020a). SSIL builds upon SIL by combining it with the other popular method
called Supervised SelfPlay (S2P) (Gupta et al., 2019). SSIL is able to mitigate the
problems of both S2P and SIL namely late-stage training collapse and low language diversity.
We evaluate our methods in a toy setting of Lewis Game, and then scale it up to
the translation game with natural language. In both settings, we highlight the efficacy of
our methods compared to the baselines.
In Chapter 1, we talk about the core concepts required for understanding the papers presented
in Chapters 2 and 3. We describe the specific problem of task-oriented dialogue
including current approaches and the challenges they face: particularly, the challenge
of language drift. We also give an overview of the iterated learning framework. Some
sections in Chapter 1 are borrowed from the papers for coherence and ease of understanding.
Chapter 2 comprises of the work published as (Lu et al., 2020b) and Chapter 3
comprises of the work published as (Lu et al., 2020a). Chapter 4 gives a conclusion on
the work
Communication Drives the Emergence of Language Universals in Neural Agents:Evidence from the Word-order/Case-marking Trade-off
Artificial learners often behave differently from human learners in the context of neural agent-based simulations of language emergence and change. A common explanation is the lack of appropriate cognitive biases in these learners. However, it has also been proposed that more naturalistic settings of language learning and use could lead to more humanlike results. We investigate this latter account, focusing on the word-order/case-marking trade-off, a widely attested language universal that has proven particularly hard to simulate. We propose a new Neural-agent Language Learning and Communication framework (NeLLCom) where pairs of speaking and listening agents first learn a miniature language via supervised learning, and then optimize it for communication via reinforcement learning. Following closely the setup of earlier human experiments, we succeed in replicating the trade-off with the new framework without hard-coding specific biases in the agents. We see this as an essential step towards the investigation of language universals with neural learners.</p
Generate Anything Anywhere in Any Scene
Text-to-image diffusion models have attracted considerable interest due to
their wide applicability across diverse fields. However, challenges persist in
creating controllable models for personalized object generation. In this paper,
we first identify the entanglement issues in existing personalized generative
models, and then propose a straightforward and efficient data augmentation
training strategy that guides the diffusion model to focus solely on object
identity. By inserting the plug-and-play adapter layers from a pre-trained
controllable diffusion model, our model obtains the ability to control the
location and size of each generated personalized object. During inference, we
propose a regionally-guided sampling technique to maintain the quality and
fidelity of the generated images. Our method achieves comparable or superior
fidelity for personalized objects, yielding a robust, versatile, and
controllable text-to-image diffusion model that is capable of generating
realistic and personalized images. Our approach demonstrates significant
potential for various applications, such as those in art, entertainment, and
advertising design
Learning to translate by learning to communicate
We formulate and test a technique to use Emergent Communication (EC) with a
pretrained multilingual model to improve on modern Unsupervised NMT systems,
especially for low-resource languages. It has been argued that the currently
dominant paradigm in NLP of pretraining on text-only corpora will not yield
robust natural language understanding systems, and the need for grounded,
goal-oriented, and interactive language learning has been highlighted. In our
approach, we embed a modern multilingual model (mBART, Liu et. al. 2020) into
an EC image-reference game, in which the model is incentivized to use
multilingual generations to accomplish a vision-grounded task, with the
hypothesis that this will align multiple languages to a shared task space. We
present two variants of EC Fine-Tuning (Steinert-Threlkeld et. al. 2022), one
of which outperforms a backtranslation-based baseline in 6/8 translation
settings, and proves especially beneficial for the very low-resource languages
of Nepali and Sinhala
AvatarStudio: Text-driven Editing of 3D Dynamic Human Head Avatars
Capturing and editing full head performances enables the creation of virtual
characters with various applications such as extended reality and media
production. The past few years witnessed a steep rise in the photorealism of
human head avatars. Such avatars can be controlled through different input data
modalities, including RGB, audio, depth, IMUs and others. While these data
modalities provide effective means of control, they mostly focus on editing the
head movements such as the facial expressions, head pose and/or camera
viewpoint. In this paper, we propose AvatarStudio, a text-based method for
editing the appearance of a dynamic full head avatar. Our approach builds on
existing work to capture dynamic performances of human heads using neural
radiance field (NeRF) and edits this representation with a text-to-image
diffusion model. Specifically, we introduce an optimization strategy for
incorporating multiple keyframes representing different camera viewpoints and
time stamps of a video performance into a single diffusion model. Using this
personalized diffusion model, we edit the dynamic NeRF by introducing
view-and-time-aware Score Distillation Sampling (VT-SDS) following a
model-based guidance approach. Our method edits the full head in a canonical
space, and then propagates these edits to remaining time steps via a pretrained
deformation network. We evaluate our method visually and numerically via a user
study, and results show that our method outperforms existing approaches. Our
experiments validate the design choices of our method and highlight that our
edits are genuine, personalized, as well as 3D- and time-consistent.Comment: 17 pages, 17 figures. Project page:
https://vcai.mpi-inf.mpg.de/projects/AvatarStudio
- …