698 research outputs found
HIGhER : Improving instruction following with Hindsight Generation for Experience Replay
Language creates a compact representation of the world and allows the
description of unlimited situations and objectives through compositionality.
While these characterizations may foster instructing, conditioning or
structuring interactive agent behavior, it remains an open-problem to correctly
relate language understanding and reinforcement learning in even simple
instruction following scenarios. This joint learning problem is alleviated
through expert demonstrations, auxiliary losses, or neural inductive biases. In
this paper, we propose an orthogonal approach called Hindsight Generation for
Experience Replay (HIGhER) that extends the Hindsight Experience Replay (HER)
approach to the language-conditioned policy setting. Whenever the agent does
not fulfill its instruction, HIGhER learns to output a new directive that
matches the agent trajectory, and it relabels the episode with a positive
reward. To do so, HIGhER learns to map a state into an instruction by using
past successful trajectories, which removes the need to have external expert
interventions to relabel episodes as in vanilla HER. We show the efficiency of
our approach in the BabyAI environment, and demonstrate how it complements
other instruction following methods.Comment: Accepted at ADPRL'2
Grounding Language to Autonomously-Acquired Skills via Goal Generation
We are interested in the autonomous acquisition of repertoires of skills.
Language-conditioned reinforcement learning (LC-RL) approaches are great tools
in this quest, as they allow to express abstract goals as sets of constraints
on the states. However, most LC-RL agents are not autonomous and cannot learn
without external instructions and feedback. Besides, their direct language
condition cannot account for the goal-directed behavior of pre-verbal infants
and strongly limits the expression of behavioral diversity for a given language
input. To resolve these issues, we propose a new conceptual approach to
language-conditioned RL: the Language-Goal-Behavior architecture (LGB). LGB
decouples skill learning and language grounding via an intermediate semantic
representation of the world. To showcase the properties of LGB, we present a
specific implementation called DECSTR. DECSTR is an intrinsically motivated
learning agent endowed with an innate semantic representation describing
spatial relations between physical objects. In a first stage (G -> B), it
freely explores its environment and targets self-generated semantic
configurations. In a second stage (L -> G), it trains a language-conditioned
goal generator to generate semantic goals that match the constraints expressed
in language-based inputs. We showcase the additional properties of LGB w.r.t.
both an end-to-end LC-RL approach and a similar approach leveraging
non-semantic, continuous intermediate representations. Intermediate semantic
representations help satisfy language commands in a diversity of ways, enable
strategy switching after a failure and facilitate language grounding.Comment: Published at ICLR 202
Explainability in Deep Reinforcement Learning
A large set of the explainable Artificial Intelligence (XAI) literature is
emerging on feature relevance techniques to explain a deep neural network (DNN)
output or explaining models that ingest image source data. However, assessing
how XAI techniques can help understand models beyond classification tasks, e.g.
for reinforcement learning (RL), has not been extensively studied. We review
recent works in the direction to attain Explainable Reinforcement Learning
(XRL), a relatively new subfield of Explainable Artificial Intelligence,
intended to be used in general public applications, with diverse audiences,
requiring ethical, responsible and trustable algorithms. In critical situations
where it is essential to justify and explain the agent's behaviour, better
explainability and interpretability of RL models could help gain scientific
insight on the inner workings of what is still considered a black box. We
evaluate mainly studies directly linking explainability to RL, and split these
into two categories according to the way the explanations are generated:
transparent algorithms and post-hoc explainaility. We also review the most
prominent XAI works from the lenses of how they could potentially enlighten the
further deployment of the latest advances in RL, in the demanding present and
future of everyday problems.Comment: Article accepted at Knowledge-Based System
Robotic Skill Acquisition via Instruction Augmentation with Vision-Language Models
In recent years, much progress has been made in learning robotic manipulation
policies that follow natural language instructions. Such methods typically
learn from corpora of robot-language data that was either collected with
specific tasks in mind or expensively re-labelled by humans with rich language
descriptions in hindsight. Recently, large-scale pretrained vision-language
models (VLMs) like CLIP or ViLD have been applied to robotics for learning
representations and scene descriptors. Can these pretrained models serve as
automatic labelers for robot data, effectively importing Internet-scale
knowledge into existing datasets to make them useful even for tasks that are
not reflected in their ground truth annotations? To accomplish this, we
introduce Data-driven Instruction Augmentation for Language-conditioned control
(DIAL): we utilize semi-supervised language labels leveraging the semantic
understanding of CLIP to propagate knowledge onto large datasets of unlabelled
demonstration data and then train language-conditioned policies on the
augmented datasets. This method enables cheaper acquisition of useful language
descriptions compared to expensive human labels, allowing for more efficient
label coverage of large-scale datasets. We apply DIAL to a challenging
real-world robotic manipulation domain where 96.5% of the 80,000 demonstrations
do not contain crowd-sourced language annotations. DIAL enables imitation
learning policies to acquire new capabilities and generalize to 60 novel
instructions unseen in the original dataset
Grounding Language to Autonomously-Acquired Skills via Goal Generation
International audienceWe are interested in the autonomous acquisition of repertoires of skills. Language-conditioned reinforcement learning (LC-RL) approaches are great tools in this quest, as they allow to express abstract goals as sets of constraints on the states. However, most LC-RL agents are not autonomous and cannot learn without external instructions and feedback. Besides, their direct language condition cannot account for the goal-directed behavior of pre-verbal infants and strongly limits the expression of behavioral diversity for a given language input. To resolve these issues, we propose a new conceptual approach to language-conditioned RL: the Language-Goal-Behavior architecture (LGB). LGB decouples skill learning and language grounding via an intermediate semantic representation of the world. To showcase the properties of LGB, we present a specific implementation called DECSTR. DECSTR is an intrinsically motivated learning agent endowed with an innate semantic representation describing spatial relations between physical objects. In a first stage (G -> B), it freely explores its environment and targets self-generated semantic configurations. In a second stage (L -> G), it trains a language-conditioned goal generator to generate semantic goals that match the constraints expressed in language-based inputs. We showcase the additional properties of LGB w.r.t. both an end-to-end LC-RL approach and a similar approach leveraging non-semantic, continuous intermediate representations. Intermediate semantic representations help satisfy language commands in a diversity of ways, enable strategy switching after a failure and facilitate language grounding
Explainability in Deep Reinforcement Learning
International audienceA large set of the explainable Artificial Intelligence (XAI) literature is emerging on feature relevance techniques to explain a deep neural network (DNN) output or explaining models that ingest image source data. However, assessing how XAI techniques can help understand models beyond classification tasks, e.g. for reinforcement learning (RL), has not been extensively studied. We review recent works in the direction to attain Explainable Reinforcement Learning (XRL), a relatively new subfield of Explainable Artificial Intelligence, intended to be used in general public applications, with diverse audiences, requiring ethical, responsible and trustable algorithms. In critical situations where it is essential to justify and explain the agent's behaviour, better explainability and interpretability of RL models could help gain scientific insight on the inner workings of what is still considered a black box. We evaluate mainly studies directly linking explainability to RL, and split these into two categories according to the way the explanations are generated: transparent algorithms and post-hoc explainaility. We also review the most prominent XAI works from the lenses of how they could potentially enlighten the further deployment of the latest advances in RL, in the demanding present and future of everyday problems
Learning Rewards from Linguistic Feedback
We explore unconstrained natural language feedback as a learning signal for
artificial agents. Humans use rich and varied language to teach, yet most prior
work on interactive learning from language assumes a particular form of input
(e.g., commands). We propose a general framework which does not make this
assumption, using aspect-based sentiment analysis to decompose feedback into
sentiment about the features of a Markov decision process. We then perform an
analogue of inverse reinforcement learning, regressing the sentiment on the
features to infer the teacher's latent reward function. To evaluate our
approach, we first collect a corpus of teaching behavior in a cooperative task
where both teacher and learner are human. We implement three artificial
learners: sentiment-based "literal" and "pragmatic" models, and an inference
network trained end-to-end to predict latent rewards. We then repeat our
initial experiment and pair them with human teachers. All three successfully
learn from interactive human feedback. The sentiment models outperform the
inference network, with the "pragmatic" model approaching human performance.
Our work thus provides insight into the information structure of naturalistic
linguistic feedback as well as methods to leverage it for reinforcement
learning.Comment: 9 pages, 4 figures. AAAI '2
Autotelic Agents with Intrinsically Motivated Goal-Conditioned Reinforcement Learning: a Short Survey
Building autonomous machines that can explore open-ended environments,
discover possible interactions and build repertoires of skills is a general
objective of artificial intelligence. Developmental approaches argue that this
can only be achieved by : intrinsically motivated learning
agents that can learn to represent, generate, select and solve their own
problems. In recent years, the convergence of developmental approaches with
deep reinforcement learning (RL) methods has been leading to the emergence of a
new field: . Developmental RL is
concerned with the use of deep RL algorithms to tackle a developmental problem
-- the -
. The self-generation of goals requires the learning
of compact goal encodings as well as their associated goal-achievement
functions. This raises new challenges compared to standard RL algorithms
originally designed to tackle pre-defined sets of goals using external reward
signals. The present paper introduces developmental RL and proposes a
computational framework based on goal-conditioned RL to tackle the
intrinsically motivated skills acquisition problem. It proceeds to present a
typology of the various goal representations used in the literature, before
reviewing existing methods to learn to represent and prioritize goals in
autonomous systems. We finally close the paper by discussing some open
challenges in the quest of intrinsically motivated skills acquisition
Deep reinforcement learning for multi-modal embodied navigation
Ce travail se concentre sur une tĂąche de micro-navigation en plein air oĂč le but est de naviguer
vers une adresse de rue spécifiée en utilisant plusieurs modalités (par exemple, images, texte
de scĂšne et GPS). La tĂąche de micro-navigation extĂ©rieure sâavĂšre etre un dĂ©fi important pour
de nombreuses personnes malvoyantes, ce que nous démontrons à travers des entretiens et
des études de marché, et nous limitons notre définition des problÚmes à leurs besoins. Nous
expĂ©rimentons dâabord avec un monde en grille partiellement observable (Grid-Street et Grid
City) contenant des maisons, des numéros de rue et des régions navigables. Ensuite, nous
introduisons le Environnement de Trottoir pour la Navigation Visuelle (ETNV), qui contient
des images panoramiques avec des boßtes englobantes pour les numéros de maison, les portes
et les panneaux de nom de rue, et des formulations pour plusieurs tĂąches de navigation. Dans
SEVN, nous formons un modĂšle de politique pour fusionner des observations multimodales
sous la forme dâimages Ă rĂ©solution variable, de texte visible et de donnĂ©es GPS simulĂ©es afin
de naviguer vers une porte dâobjectif. Nous entraĂźnons ce modĂšle en utilisant lâalgorithme
dâapprentissage par renforcement, Proximal Policy Optimization (PPO). Nous espĂ©rons que
cette thĂšse fournira une base pour dâautres recherches sur la crĂ©ation dâagents pouvant aider
les membres de la communauté des gens malvoyantes à naviguer le monde.This work focuses on an Outdoor Micro-Navigation (OMN) task in which the goal is to
navigate to a specified street address using multiple modalities including images, scene-text,
and GPS. This task is a significant challenge to many Blind and Visually Impaired (BVI)
people, which we demonstrate through interviews and market research. To investigate the
feasibility of solving this task with Deep Reinforcement Learning (DRL), we first introduce
two partially observable grid-worlds, Grid-Street and Grid City, containing houses, street
numbers, and navigable regions. In these environments, we train an agent to find specific
houses using local observations under a variety of training procedures. We parameterize
our agent with a neural network and train using reinforcement learning methods. Next, we
introduce the Sidewalk Environment for Visual Navigation (SEVN), which contains panoramic
images with labels for house numbers, doors, and street name signs, and formulations for
several navigation tasks. In SEVN, we train another neural network model using Proximal
Policy Optimization (PPO) to fuse multi-modal observations in the form of variable resolution
images, visible text, and simulated GPS data, and to use this representation to navigate to
goal doors. Our best model used all available modalities and was able to navigate to over 100
goals with an 85% success rate. We found that models with access to only a subset of these
modalities performed significantly worse, supporting the need for a multi-modal approach to
the OMN task. We hope that this thesis provides a foundation for further research into the
creation of agents to assist members of the BVI community to safely navigate
- âŠ