11,277 research outputs found
NoRML: No-Reward Meta Learning
Efficiently adapting to new environments and changes in dynamics is critical
for agents to successfully operate in the real world. Reinforcement learning
(RL) based approaches typically rely on external reward feedback for
adaptation. However, in many scenarios this reward signal might not be readily
available for the target task, or the difference between the environments can
be implicit and only observable from the dynamics. To this end, we introduce a
method that allows for self-adaptation of learned policies: No-Reward Meta
Learning (NoRML). NoRML extends Model Agnostic Meta Learning (MAML) for RL and
uses observable dynamics of the environment instead of an explicit reward
function in MAML's finetune step. Our method has a more expressive update step
than MAML, while maintaining MAML's gradient based foundation. Additionally, in
order to allow more targeted exploration, we implement an extension to MAML
that effectively disconnects the meta-policy parameters from the fine-tuned
policies' parameters. We first study our method on a number of synthetic
control problems and then validate our method on common benchmark environments,
showing that NoRML outperforms MAML when the dynamics change between tasks
A Simple Neural Attentive Meta-Learner
Deep neural networks excel in regimes with large amounts of data, but tend to
struggle when data is scarce or when they need to adapt quickly to changes in
the task. In response, recent work in meta-learning proposes training a
meta-learner on a distribution of similar tasks, in the hopes of generalization
to novel but related tasks by learning a high-level strategy that captures the
essence of the problem it is asked to solve. However, many recent meta-learning
approaches are extensively hand-designed, either using architectures
specialized to a particular application, or hard-coding algorithmic components
that constrain how the meta-learner solves the task. We propose a class of
simple and generic meta-learner architectures that use a novel combination of
temporal convolutions and soft attention; the former to aggregate information
from past experience and the latter to pinpoint specific pieces of information.
In the most extensive set of meta-learning experiments to date, we evaluate the
resulting Simple Neural AttentIve Learner (or SNAIL) on several
heavily-benchmarked tasks. On all tasks, in both supervised and reinforcement
learning, SNAIL attains state-of-the-art performance by significant margins.Comment: iclr 2018 versio
Learning latent state representation for speeding up exploration
Exploration is an extremely challenging problem in reinforcement learning,
especially in high dimensional state and action spaces and when only sparse
rewards are available. Effective representations can indicate which components
of the state are task relevant and thus reduce the dimensionality of the space
to explore. In this work, we take a representation learning viewpoint on
exploration, utilizing prior experience to learn effective latent
representations, which can subsequently indicate which regions to explore.
Prior experience on separate but related tasks help learn representations of
the state which are effective at predicting instantaneous rewards. These
learned representations can then be used with an entropy-based exploration
method to effectively perform exploration in high dimensional spaces by
effectively lowering the dimensionality of the search space. We show the
benefits of this representation for meta-exploration in a simulated object
pushing environment.Comment: 7 pages, 8 figures, worksho
The Natural Language Decathlon: Multitask Learning as Question Answering
Deep learning has improved performance on many natural language processing
(NLP) tasks individually. However, general NLP models cannot emerge within a
paradigm that focuses on the particularities of a single metric, dataset, and
task. We introduce the Natural Language Decathlon (decaNLP), a challenge that
spans ten tasks: question answering, machine translation, summarization,
natural language inference, sentiment analysis, semantic role labeling,
zero-shot relation extraction, goal-oriented dialogue, semantic parsing, and
commonsense pronoun resolution. We cast all tasks as question answering over a
context. Furthermore, we present a new Multitask Question Answering Network
(MQAN) jointly learns all tasks in decaNLP without any task-specific modules or
parameters in the multitask setting. MQAN shows improvements in transfer
learning for machine translation and named entity recognition, domain
adaptation for sentiment analysis and natural language inference, and zero-shot
capabilities for text classification. We demonstrate that the MQAN's
multi-pointer-generator decoder is key to this success and performance further
improves with an anti-curriculum training strategy. Though designed for
decaNLP, MQAN also achieves state of the art results on the WikiSQL semantic
parsing task in the single-task setting. We also release code for procuring and
processing data, training and evaluating models, and reproducing all
experiments for decaNLP
Meta-Reinforcement Learning for Robotic Industrial Insertion Tasks
Robotic insertion tasks are characterized by contact and friction mechanics,
making them challenging for conventional feedback control methods due to
unmodeled physical effects. Reinforcement learning (RL) is a promising approach
for learning control policies in such settings. However, RL can be unsafe
during exploration and might require a large amount of real-world training
data, which is expensive to collect. In this paper, we study how to use
meta-reinforcement learning to solve the bulk of the problem in simulation by
solving a family of simulated industrial insertion tasks and then adapt
policies quickly in the real world. We demonstrate our approach by training an
agent to successfully perform challenging real-world insertion tasks using less
than 20 trials of real-world experience. Videos and other material are
available at https://pearl-insertion.github.io/Comment: 9 pages, 8 figure
Learning to Learn How to Learn: Self-Adaptive Visual Navigation Using Meta-Learning
Learning is an inherently continuous phenomenon. When humans learn a new task
there is no explicit distinction between training and inference. As we learn a
task, we keep learning about it while performing the task. What we learn and
how we learn it varies during different stages of learning. Learning how to
learn and adapt is a key property that enables us to generalize effortlessly to
new settings. This is in contrast with conventional settings in machine
learning where a trained model is frozen during inference. In this paper we
study the problem of learning to learn at both training and test time in the
context of visual navigation. A fundamental challenge in navigation is
generalization to unseen scenes. In this paper we propose a self-adaptive
visual navigation method (SAVN) which learns to adapt to new environments
without any explicit supervision. Our solution is a meta-reinforcement learning
approach where an agent learns a self-supervised interaction loss that
encourages effective navigation. Our experiments, performed in the AI2-THOR
framework, show major improvements in both success rate and SPL for visual
navigation in novel scenes. Our code and data are available at:
https://github.com/allenai/savn
Learning Representations in Model-Free Hierarchical Reinforcement Learning
Common approaches to Reinforcement Learning (RL) are seriously challenged by
large-scale applications involving huge state spaces and sparse delayed reward
feedback. Hierarchical Reinforcement Learning (HRL) methods attempt to address
this scalability issue by learning action selection policies at multiple levels
of temporal abstraction. Abstraction can be had by identifying a relatively
small set of states that are likely to be useful as subgoals, in concert with
the learning of corresponding skill policies to achieve those subgoals. Many
approaches to subgoal discovery in HRL depend on the analysis of a model of the
environment, but the need to learn such a model introduces its own problems of
scale. Once subgoals are identified, skills may be learned through intrinsic
motivation, introducing an internal reward signal marking subgoal attainment.
In this paper, we present a novel model-free method for subgoal discovery using
incremental unsupervised learning over a small memory of the most recent
experiences (trajectories) of the agent. When combined with an intrinsic
motivation learning mechanism, this method learns both subgoals and skills,
based on experiences in the environment. Thus, we offer an original approach to
HRL that does not require the acquisition of a model of the environment,
suitable for large-scale applications. We demonstrate the efficiency of our
method on two RL problems with sparse delayed feedback: a variant of the rooms
environment and the first screen of the ATARI 2600 Montezuma's Revenge game
Scaling Configuration of Energy Harvesting Sensors with Reinforcement Learning
With the advent of the Internet of Things (IoT), an increasing number of
energy harvesting methods are being used to supplement or supplant battery
based sensors. Energy harvesting sensors need to be configured according to the
application, hardware, and environmental conditions to maximize their
usefulness. As of today, the configuration of sensors is either manual or
heuristics based, requiring valuable domain expertise. Reinforcement learning
(RL) is a promising approach to automate configuration and efficiently scale
IoT deployments, but it is not yet adopted in practice. We propose solutions to
bridge this gap: reduce the training phase of RL so that nodes are operational
within a short time after deployment and reduce the computational requirements
to scale to large deployments. We focus on configuration of the sampling rate
of indoor solar panel based energy harvesting sensors. We created a simulator
based on 3 months of data collected from 5 sensor nodes subject to different
lighting conditions. Our simulation results show that RL can effectively learn
energy availability patterns and configure the sampling rate of the sensor
nodes to maximize the sensing data while ensuring that energy storage is not
depleted. The nodes can be operational within the first day by using our
methods. We show that it is possible to reduce the number of RL policies by
using a single policy for nodes that share similar lighting conditions.Comment: 7 pages, 5 figure
Routing Networks and the Challenges of Modular and Compositional Computation
Compositionality is a key strategy for addressing combinatorial complexity
and the curse of dimensionality. Recent work has shown that compositional
solutions can be learned and offer substantial gains across a variety of
domains, including multi-task learning, language modeling, visual question
answering, machine comprehension, and others. However, such models present
unique challenges during training when both the module parameters and their
composition must be learned jointly. In this paper, we identify several of
these issues and analyze their underlying causes. Our discussion focuses on
routing networks, a general approach to this problem, and examines empirically
the interplay of these challenges and a variety of design decisions. In
particular, we consider the effect of how the algorithm decides on module
composition, how the algorithm updates the modules, and if the algorithm uses
regularization
One-Shot Learning of Multi-Step Tasks from Observation via Activity Localization in Auxiliary Video
Due to burdensome data requirements, learning from demonstration often falls
short of its promise to allow users to quickly and naturally program robots.
Demonstrations are inherently ambiguous and incomplete, making correct
generalization to unseen situations difficult without a large number of
demonstrations in varying conditions. By contrast, humans are often able to
learn complex tasks from a single demonstration (typically observations without
action labels) by leveraging context learned over a lifetime. Inspired by this
capability, our goal is to enable robots to perform one-shot learning of
multi-step tasks from observation by leveraging auxiliary video data as
context. Our primary contribution is a novel system that achieves this goal by:
(1) using a single user-segmented demonstration to define the primitive actions
that comprise a task, (2) localizing additional examples of these actions in
unsegmented auxiliary videos via a metalearning-based approach, (3) using these
additional examples to learn a reward function for each action, and (4)
performing reinforcement learning on top of the inferred reward functions to
learn action policies that can be combined to accomplish the task. We
empirically demonstrate that a robot can learn multi-step tasks more
effectively when provided auxiliary video, and that performance greatly
improves when localizing individual actions, compared to learning from
unsegmented videos.Comment: ICRA 201
- …