408,959 research outputs found
Adversarial Imitation Learning from Incomplete Demonstrations
Imitation learning targets deriving a mapping from states to actions, a.k.a.
policy, from expert demonstrations. Existing methods for imitation learning
typically require any actions in the demonstrations to be fully available,
which is hard to ensure in real applications. Though algorithms for learning
with unobservable actions have been proposed, they focus solely on state
information and overlook the fact that the action sequence could still be
partially available and provide useful information for policy deriving. In this
paper, we propose a novel algorithm called Action-Guided Adversarial Imitation
Learning (AGAIL) that learns a policy from demonstrations with incomplete
action sequences, i.e., incomplete demonstrations. The core idea of AGAIL is to
separate demonstrations into state and action trajectories, and train a policy
with state trajectories while using actions as auxiliary information to guide
the training whenever applicable. Built upon the Generative Adversarial
Imitation Learning, AGAIL has three components: a generator, a discriminator,
and a guide. The generator learns a policy with rewards provided by the
discriminator, which tries to distinguish state distributions between
demonstrations and samples generated by the policy. The guide provides
additional rewards to the generator when demonstrated actions for specific
states are available. We compare AGAIL to other methods on benchmark tasks and
show that AGAIL consistently delivers comparable performance to the
state-of-the-art methods even when the action sequence in demonstrations is
only partially available.Comment: Accepted to International Joint Conference on Artificial Intelligence
(IJCAI-19
Multi-Modal Imitation Learning from Unstructured Demonstrations using Generative Adversarial Nets
Imitation learning has traditionally been applied to learn a single task from
demonstrations thereof. The requirement of structured and isolated
demonstrations limits the scalability of imitation learning approaches as they
are difficult to apply to real-world scenarios, where robots have to be able to
execute a multitude of tasks. In this paper, we propose a multi-modal imitation
learning framework that is able to segment and imitate skills from unlabelled
and unstructured demonstrations by learning skill segmentation and imitation
learning jointly. The extensive simulation results indicate that our method can
efficiently separate the demonstrations into individual skills and learn to
imitate them using a single multi-modal policy. The video of our experiments is
available at http://sites.google.com/view/nips17intentionganComment: Paper accepted to NIPS 201
Deep Q-learning from Demonstrations
Deep reinforcement learning (RL) has achieved several high profile successes
in difficult decision-making problems. However, these algorithms typically
require a huge amount of data before they reach reasonable performance. In
fact, their performance during learning can be extremely poor. This may be
acceptable for a simulator, but it severely limits the applicability of deep RL
to many real-world tasks, where the agent must learn in the real environment.
In this paper we study a setting where the agent may access data from previous
control of the system. We present an algorithm, Deep Q-learning from
Demonstrations (DQfD), that leverages small sets of demonstration data to
massively accelerate the learning process even from relatively small amounts of
demonstration data and is able to automatically assess the necessary ratio of
demonstration data while learning thanks to a prioritized replay mechanism.
DQfD works by combining temporal difference updates with supervised
classification of the demonstrator's actions. We show that DQfD has better
initial performance than Prioritized Dueling Double Deep Q-Networks (PDD DQN)
as it starts with better scores on the first million steps on 41 of 42 games
and on average it takes PDD DQN 83 million steps to catch up to DQfD's
performance. DQfD learns to out-perform the best demonstration given in 14 of
42 games. In addition, DQfD leverages human demonstrations to achieve
state-of-the-art results for 11 games. Finally, we show that DQfD performs
better than three related algorithms for incorporating demonstration data into
DQN.Comment: Published at AAAI 2018. Previously on arxiv as "Learning from
Demonstrations for Real World Reinforcement Learning
- …