15,232 research outputs found
Meta-learning of Sequential Strategies
In this report we review memory-based meta-learning as a tool for building
sample-efficient strategies that learn from past experience to adapt to any
task within a target class. Our goal is to equip the reader with the conceptual
foundations of this tool for building new, scalable agents that operate on
broad domains. To do so, we present basic algorithmic templates for building
near-optimal predictors and reinforcement learners which behave as if they had
a probabilistic model that allowed them to efficiently exploit task structure.
Furthermore, we recast memory-based meta-learning within a Bayesian framework,
showing that the meta-learned strategies are near-optimal because they amortize
Bayes-filtered data, where the adaptation is implemented in the memory dynamics
as a state-machine of sufficient statistics. Essentially, memory-based
meta-learning translates the hard problem of probabilistic sequential inference
into a regression problem.Comment: DeepMind Technical Report (15 pages, 6 figures). Version V1.
Learning to Learn How to Learn: Self-Adaptive Visual Navigation Using Meta-Learning
Learning is an inherently continuous phenomenon. When humans learn a new task
there is no explicit distinction between training and inference. As we learn a
task, we keep learning about it while performing the task. What we learn and
how we learn it varies during different stages of learning. Learning how to
learn and adapt is a key property that enables us to generalize effortlessly to
new settings. This is in contrast with conventional settings in machine
learning where a trained model is frozen during inference. In this paper we
study the problem of learning to learn at both training and test time in the
context of visual navigation. A fundamental challenge in navigation is
generalization to unseen scenes. In this paper we propose a self-adaptive
visual navigation method (SAVN) which learns to adapt to new environments
without any explicit supervision. Our solution is a meta-reinforcement learning
approach where an agent learns a self-supervised interaction loss that
encourages effective navigation. Our experiments, performed in the AI2-THOR
framework, show major improvements in both success rate and SPL for visual
navigation in novel scenes. Our code and data are available at:
https://github.com/allenai/savn
ContraBAR: Contrastive Bayes-Adaptive Deep RL
In meta reinforcement learning (meta RL), an agent seeks a Bayes-optimal
policy -- the optimal policy when facing an unknown task that is sampled from
some known task distribution. Previous approaches tackled this problem by
inferring a belief over task parameters, using variational inference methods.
Motivated by recent successes of contrastive learning approaches in RL, such as
contrastive predictive coding (CPC), we investigate whether contrastive methods
can be used for learning Bayes-optimal behavior. We begin by proving that
representations learned by CPC are indeed sufficient for Bayes optimality.
Based on this observation, we propose a simple meta RL algorithm that uses CPC
in lieu of variational belief inference. Our method, ContraBAR, achieves
comparable performance to state-of-the-art in domains with state-based
observation and circumvents the computational toll of future observation
reconstruction, enabling learning in domains with image-based observations. It
can also be combined with image augmentations for domain randomization and used
seamlessly in both online and offline meta RL settings.Comment: ICML 2023. Pytorch code available at
https://github.com/ec2604/ContraBA
FOCAL: Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization
We study the offline meta-reinforcement learning (OMRL) problem, a paradigm
which enables reinforcement learning (RL) algorithms to quickly adapt to unseen
tasks without any interactions with the environments, making RL truly practical
in many real-world applications. This problem is still not fully understood,
for which two major challenges need to be addressed. First, offline RL usually
suffers from bootstrapping errors of out-of-distribution state-actions which
leads to divergence of value functions. Second, meta-RL requires efficient and
robust task inference learned jointly with control policy. In this work, we
enforce behavior regularization on learned policy as a general approach to
offline RL, combined with a deterministic context encoder for efficient task
inference. We propose a novel negative-power distance metric on bounded context
embedding space, whose gradients propagation is detached from the Bellman
backup. We provide analysis and insight showing that some simple design choices
can yield substantial improvements over recent approaches involving meta-RL and
distance metric learning. To the best of our knowledge, our method is the first
model-free and end-to-end OMRL algorithm, which is computationally efficient
and demonstrated to outperform prior algorithms on several meta-RL benchmarks.Comment: 22 pages, 11 figure
MELD: Meta-Reinforcement Learning from Images via Latent State Models
Meta-reinforcement learning algorithms can enable autonomous agents, such as
robots, to quickly acquire new behaviors by leveraging prior experience in a
set of related training tasks. However, the onerous data requirements of
meta-training compounded with the challenge of learning from sensory inputs
such as images have made meta-RL challenging to apply to real robotic systems.
Latent state models, which learn compact state representations from a sequence
of observations, can accelerate representation learning from visual inputs. In
this paper, we leverage the perspective of meta-learning as task inference to
show that latent state models can \emph{also} perform meta-learning given an
appropriately defined observation space. Building on this insight, we develop
meta-RL with latent dynamics (MELD), an algorithm for meta-RL from images that
performs inference in a latent state model to quickly acquire new skills given
observations and rewards. MELD outperforms prior meta-RL methods on several
simulated image-based robotic control problems, and enables a real WidowX
robotic arm to insert an Ethernet cable into new locations given a sparse task
completion signal after only hours of real world meta-training. To our
knowledge, MELD is the first meta-RL algorithm trained in a real-world robotic
control setting from images.Comment: Accepted to CoRL 2020. Supplementary material at
https://sites.google.com/view/meld-lsm/home . 16 pages, 19 figures. V2: add
funding acknowledgements, reduce file siz
Meta-learning of Sequential Strategies
In this report we review memory-based meta-learning as a tool for building
sample-efficient strategies that learn from past experience to adapt to any
task within a target class. Our goal is to equip the reader with the conceptual
foundations of this tool for building new, scalable agents that operate on
broad domains. To do so, we present basic algorithmic templates for building
near-optimal predictors and reinforcement learners which behave as if they had
a probabilistic model that allowed them to efficiently exploit task structure.
Furthermore, we recast memory-based meta-learning within a Bayesian framework,
showing that the meta-learned strategies are near-optimal because they amortize
Bayes-filtered data, where the adaptation is implemented in the memory dynamics
as a state-machine of sufficient statistics. Essentially, memory-based
meta-learning translates the hard problem of probabilistic sequential inference
into a regression problem.Comment: DeepMind Technical Report (15 pages, 6 figures
Importance Weighted Policy Learning and Adaptation
The ability to exploit prior experience to solve novel problems rapidly is a
hallmark of biological learning systems and of great practical importance for
artificial ones. In the meta reinforcement learning literature much recent work
has focused on the problem of optimizing the learning process itself. In this
paper we study a complementary approach which is conceptually simple, general,
modular and built on top of recent improvements in off-policy learning. The
framework is inspired by ideas from the probabilistic inference literature and
combines robust off-policy learning with a behavior prior, or default behavior
that constrains the space of solutions and serves as a bias for exploration; as
well as a representation for the value function, both of which are easily
learned from a number of training tasks in a multi-task scenario. Our approach
achieves competitive adaptation performance on hold-out tasks compared to meta
reinforcement learning baselines and can scale to complex sparse-reward
scenarios
RL: Boosting Meta Reinforcement Learning via RL inside RL
Meta reinforcement learning (meta-RL) methods such as RL have emerged as
promising approaches for learning data-efficient RL algorithms tailored to a
given task distribution. However, these RL algorithms struggle with
long-horizon tasks and out-of-distribution tasks since they rely on recurrent
neural networks to process the sequence of experiences instead of summarizing
them into general RL components such as value functions. Moreover, even
transformers have a practical limit to the length of histories they can
efficiently reason about before training and inference costs become
prohibitive. In contrast, traditional RL algorithms are data-inefficient since
they do not leverage domain knowledge, but they do converge to an optimal
policy as more data becomes available. In this paper, we propose RL, a
principled hybrid approach that combines traditional RL and meta-RL by
incorporating task-specific action-values learned through traditional RL as an
input to the meta-RL neural network. We show that RL earns greater
cumulative reward on long-horizon and out-of-distribution tasks compared to
RL, while maintaining the efficiency of the latter in the short term.
Experiments are conducted on both custom and benchmark discrete domains from
the meta-RL literature that exhibit a range of short-term, long-term, and
complex dependencies
Task-Agnostic Online Reinforcement Learning with an Infinite Mixture of Gaussian Processes
Continuously learning to solve unseen tasks with limited experience has been
extensively pursued in meta-learning and continual learning, but with
restricted assumptions such as accessible task distributions, independently and
identically distributed tasks, and clear task delineations. However, real-world
physical tasks frequently violate these assumptions, resulting in performance
degradation. This paper proposes a continual online model-based reinforcement
learning approach that does not require pre-training to solve task-agnostic
problems with unknown task boundaries. We maintain a mixture of experts to
handle nonstationarity, and represent each different type of dynamics with a
Gaussian Process to efficiently leverage collected data and expressively model
uncertainty. We propose a transition prior to account for the temporal
dependencies in streaming data and update the mixture online via sequential
variational inference. Our approach reliably handles the task distribution
shift by generating new models for never-before-seen dynamics and reusing old
models for previously seen dynamics. In experiments, our approach outperforms
alternative methods in non-stationary tasks, including classic control with
changing dynamics and decision making in different driving scenarios.Comment: 16 pages, 6 figure
Towards intervention-centric causal reasoning in learning agents
Interventions are central to causal learning and reasoning. Yet ultimately an
intervention is an abstraction: an agent embedded in a physical environment
(perhaps modeled as a Markov decision process) does not typically come equipped
with the notion of an intervention -- its action space is typically
ego-centric, without actions of the form `intervene on X'. Such a
correspondence between ego-centric actions and interventions would be
challenging to hard-code. It would instead be better if an agent learnt which
sequence of actions allow it to make targeted manipulations of the environment,
and learnt corresponding representations that permitted learning from
observation. Here we show how a meta-learning approach can be used to perform
causal learning in this challenging setting, where the action-space is not a
set of interventions and the observation space is a high-dimensional space with
a latent causal structure. A meta-reinforcement learning algorithm is used to
learn relationships that transfer on observational causal learning tasks. This
work shows how advances in deep reinforcement learning and meta-learning can
provide intervention-centric causal learning in high-dimensional environments
with a latent causal structure.Comment: 11 page, 4 figures. Presented at ICLR 2020 workshop 'Causal learning
for decision making
- …