12,799 research outputs found
Neural Task Programming: Learning to Generalize Across Hierarchical Tasks
In this work, we propose a novel robot learning framework called Neural Task
Programming (NTP), which bridges the idea of few-shot learning from
demonstration and neural program induction. NTP takes as input a task
specification (e.g., video demonstration of a task) and recursively decomposes
it into finer sub-task specifications. These specifications are fed to a
hierarchical neural program, where bottom-level programs are callable
subroutines that interact with the environment. We validate our method in three
robot manipulation tasks. NTP achieves strong generalization across sequential
tasks that exhibit hierarchal and compositional structures. The experimental
results show that NTP learns to generalize well to- wards unseen tasks with
increasing lengths, variable topologies, and changing objectives.Comment: ICRA 201
Mapping Instructions and Visual Observations to Actions with Reinforcement Learning
We propose to directly map raw visual observations and text input to actions
for instruction execution. While existing approaches assume access to
structured environment representations or use a pipeline of separately trained
models, we learn a single model to jointly reason about linguistic and visual
input. We use reinforcement learning in a contextual bandit setting to train a
neural network agent. To guide the agent's exploration, we use reward shaping
with different forms of supervision. Our approach does not require intermediate
representations, planning procedures, or training different models. We evaluate
in a simulated environment, and show significant improvements over supervised
learning and common reinforcement learning variants.Comment: In Proceedings of the Conference on Empirical Methods in Natural
Language Processing (EMNLP), 201
Efficiently Combining Human Demonstrations and Interventions for Safe Training of Autonomous Systems in Real-Time
This paper investigates how to utilize different forms of human interaction
to safely train autonomous systems in real-time by learning from both human
demonstrations and interventions. We implement two components of the
Cycle-of-Learning for Autonomous Systems, which is our framework for combining
multiple modalities of human interaction. The current effort employs human
demonstrations to teach a desired behavior via imitation learning, then
leverages intervention data to correct for undesired behaviors produced by the
imitation learner to teach novel tasks to an autonomous agent safely, after
only minutes of training. We demonstrate this method in an autonomous perching
task using a quadrotor with continuous roll, pitch, yaw, and throttle commands
and imagery captured from a downward-facing camera in a high-fidelity simulated
environment. Our method improves task completion performance for the same
amount of human interaction when compared to learning from demonstrations
alone, while also requiring on average 32% less data to achieve that
performance. This provides evidence that combining multiple modes of human
interaction can increase both the training speed and overall performance of
policies for autonomous systems.Comment: 9 pages, 6 figure
Deep reinforcement learning from human preferences
For sophisticated reinforcement learning (RL) systems to interact usefully
with real-world environments, we need to communicate complex goals to these
systems. In this work, we explore goals defined in terms of (non-expert) human
preferences between pairs of trajectory segments. We show that this approach
can effectively solve complex RL tasks without access to the reward function,
including Atari games and simulated robot locomotion, while providing feedback
on less than one percent of our agent's interactions with the environment. This
reduces the cost of human oversight far enough that it can be practically
applied to state-of-the-art RL systems. To demonstrate the flexibility of our
approach, we show that we can successfully train complex novel behaviors with
about an hour of human time. These behaviors and environments are considerably
more complex than any that have been previously learned from human feedback
- …