6,100 research outputs found
Return-Based Contrastive Representation Learning for Reinforcement Learning
Recently, various auxiliary tasks have been proposed to accelerate
representation learning and improve sample efficiency in deep reinforcement
learning (RL). However, existing auxiliary tasks do not take the
characteristics of RL problems into consideration and are unsupervised. By
leveraging returns, the most important feedback signals in RL, we propose a
novel auxiliary task that forces the learnt representations to discriminate
state-action pairs with different returns. Our auxiliary loss is theoretically
justified to learn representations that capture the structure of a new form of
state-action abstraction, under which state-action pairs with similar return
distributions are aggregated together. In low data regime, our algorithm
outperforms strong baselines on complex tasks in Atari games and DeepMind
Control suite, and achieves even better performance when combined with existing
auxiliary tasks.Comment: ICLR 202
Dealing with Sparse Rewards in Reinforcement Learning
Successfully navigating a complex environment to obtain a desired outcome is
a difficult task, that up to recently was believed to be capable only by
humans. This perception has been broken down over time, especially with the
introduction of deep reinforcement learning, which has greatly increased the
difficulty of tasks that can be automated. However, for traditional
reinforcement learning agents this requires an environment to be able to
provide frequent extrinsic rewards, which are not known or accessible for many
real-world environments. This project aims to explore and contrast existing
reinforcement learning solutions that circumnavigate the difficulties of an
environment that provide sparse rewards. Different reinforcement solutions will
be implemented over a several video game environments with varying difficulty
and varying frequency of rewards, as to properly investigate the applicability
of these solutions. This project introduces a novel reinforcement learning
solution by combining aspects of two existing state of the art sparse reward
solutions, curiosity driven exploration and unsupervised auxiliary tasks
Unsupervised Video Object Segmentation for Deep Reinforcement Learning
We present a new technique for deep reinforcement learning that automatically
detects moving objects and uses the relevant information for action selection.
The detection of moving objects is done in an unsupervised way by exploiting
structure from motion. Instead of directly learning a policy from raw images,
the agent first learns to detect and segment moving objects by exploiting flow
information in video sequences. The learned representation is then used to
focus the policy of the agent on the moving objects. Over time, the agent
identifies which objects are critical for decision making and gradually builds
a policy based on relevant moving objects. This approach, which we call
Motion-Oriented REinforcement Learning (MOREL), is demonstrated on a suite of
Atari games where the ability to detect moving objects reduces the amount of
interaction needed with the environment to obtain a good policy. Furthermore,
the resulting policy is more interpretable than policies that directly map
images to actions or values with a black box neural network. We can gain
insight into the policy by inspecting the segmentation and motion of each
object detected by the agent. This allows practitioners to confirm whether a
policy is making decisions based on sensible information
Learning Good Representation via Continuous Attention
In this paper we present our scientific discovery that good representation
can be learned via continuous attention during the interaction between
Unsupervised Learning(UL) and Reinforcement Learning(RL) modules driven by
intrinsic motivation. Specifically, we designed intrinsic rewards generated
from UL modules for driving the RL agent to focus on objects for a period of
time and to learn good representations of objects for later object recognition
task. We evaluate our proposed algorithm in both with and without extrinsic
reward settings. Experiments with end-to-end training in simulated environments
with applications to few-shot object recognition demonstrated the effectiveness
of the proposed algorithm
Jointly Pre-training with Supervised, Autoencoder, and Value Losses for Deep Reinforcement Learning
Deep Reinforcement Learning (DRL) algorithms are known to be data
inefficient. One reason is that a DRL agent learns both the feature and the
policy tabula rasa. Integrating prior knowledge into DRL algorithms is one way
to improve learning efficiency since it helps to build helpful representations.
In this work, we consider incorporating human knowledge to accelerate the
asynchronous advantage actor-critic (A3C) algorithm by pre-training a small
amount of non-expert human demonstrations. We leverage the supervised
autoencoder framework and propose a novel pre-training strategy that jointly
trains a weighted supervised classification loss, an unsupervised
reconstruction loss, and an expected return loss. The resulting pre-trained
model learns more useful features compared to independently training in
supervised or unsupervised fashion. Our pre-training method drastically
improved the learning performance of the A3C agent in Atari games of Pong and
MsPacman, exceeding the performance of the state-of-the-art algorithms at a
much smaller number of game interactions. Our method is light-weight and easy
to implement in a single machine. For reproducibility, our code is available at
github.com/gabrieledcjr/DeepRL/tree/A3C-ALA2019Comment: Accepted in Adaptive and Learning Agents (ALA) Workshop at AAMA
SplitNet: Sim2Sim and Task2Task Transfer for Embodied Visual Navigation
We propose SplitNet, a method for decoupling visual perception and policy
learning. By incorporating auxiliary tasks and selective learning of portions
of the model, we explicitly decompose the learning objectives for visual
navigation into perceiving the world and acting on that perception. We show
dramatic improvements over baseline models on transferring between simulators,
an encouraging step towards Sim2Real. Additionally, SplitNet generalizes better
to unseen environments from the same simulator and transfers faster and more
effectively to novel embodied navigation tasks. Further, given only a small
sample from a target domain, SplitNet can match the performance of traditional
end-to-end pipelines which receive the entire dataset. Code is available
https://github.com/facebookresearch/splitne
DARLA: Improving Zero-Shot Transfer in Reinforcement Learning
Domain adaptation is an important open problem in deep reinforcement learning
(RL). In many scenarios of interest data is hard to obtain, so agents may learn
a source policy in a setting where data is readily available, with the hope
that it generalises well to the target domain. We propose a new multi-stage RL
agent, DARLA (DisentAngled Representation Learning Agent), which learns to see
before learning to act. DARLA's vision is based on learning a disentangled
representation of the observed environment. Once DARLA can see, it is able to
acquire source policies that are robust to many domain shifts - even with no
access to the target domain. DARLA significantly outperforms conventional
baselines in zero-shot domain adaptation scenarios, an effect that holds across
a variety of RL environments (Jaco arm, DeepMind Lab) and base RL algorithms
(DQN, A3C and EC).Comment: ICML 201
Meta reinforcement learning as task inference
Humans achieve efficient learning by relying on prior knowledge about the
structure of naturally occurring tasks. There is considerable interest in
designing reinforcement learning (RL) algorithms with similar properties. This
includes proposals to learn the learning algorithm itself, an idea also known
as meta learning. One formal interpretation of this idea is as a partially
observable multi-task RL problem in which task information is hidden from the
agent. Such unknown task problems can be reduced to Markov decision processes
(MDPs) by augmenting an agent's observations with an estimate of the belief
about the task based on past experience. However estimating the belief state is
intractable in most partially-observed MDPs. We propose a method that
separately learns the policy and the task belief by taking advantage of various
kinds of privileged information. Our approach can be very effective at solving
standard meta-RL environments, as well as a complex continuous control
environment with sparse rewards and requiring long-term memory
Representation Learning with Contrastive Predictive Coding
While supervised learning has enabled great progress in many applications,
unsupervised learning has not seen such widespread adoption, and remains an
important and challenging endeavor for artificial intelligence. In this work,
we propose a universal unsupervised learning approach to extract useful
representations from high-dimensional data, which we call Contrastive
Predictive Coding. The key insight of our model is to learn such
representations by predicting the future in latent space by using powerful
autoregressive models. We use a probabilistic contrastive loss which induces
the latent space to capture information that is maximally useful to predict
future samples. It also makes the model tractable by using negative sampling.
While most prior work has focused on evaluating representations for a
particular modality, we demonstrate that our approach is able to learn useful
representations achieving strong performance on four distinct domains: speech,
images, text and reinforcement learning in 3D environments
InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations
The goal of imitation learning is to mimic expert behavior without access to
an explicit reward signal. Expert demonstrations provided by humans, however,
often show significant variability due to latent factors that are typically not
explicitly modeled. In this paper, we propose a new algorithm that can infer
the latent structure of expert demonstrations in an unsupervised way. Our
method, built on top of Generative Adversarial Imitation Learning, can not only
imitate complex behaviors, but also learn interpretable and meaningful
representations of complex behavioral data, including visual demonstrations. In
the driving domain, we show that a model learned from human demonstrations is
able to both accurately reproduce a variety of behaviors and accurately
anticipate human actions using raw visual inputs. Compared with various
baselines, our method can better capture the latent structure underlying expert
demonstrations, often recovering semantically meaningful factors of variation
in the data.Comment: 14 pages, NIPS 201
- …