3,153 research outputs found
Adversarial Imitation Learning from Incomplete Demonstrations
Imitation learning targets deriving a mapping from states to actions, a.k.a.
policy, from expert demonstrations. Existing methods for imitation learning
typically require any actions in the demonstrations to be fully available,
which is hard to ensure in real applications. Though algorithms for learning
with unobservable actions have been proposed, they focus solely on state
information and overlook the fact that the action sequence could still be
partially available and provide useful information for policy deriving. In this
paper, we propose a novel algorithm called Action-Guided Adversarial Imitation
Learning (AGAIL) that learns a policy from demonstrations with incomplete
action sequences, i.e., incomplete demonstrations. The core idea of AGAIL is to
separate demonstrations into state and action trajectories, and train a policy
with state trajectories while using actions as auxiliary information to guide
the training whenever applicable. Built upon the Generative Adversarial
Imitation Learning, AGAIL has three components: a generator, a discriminator,
and a guide. The generator learns a policy with rewards provided by the
discriminator, which tries to distinguish state distributions between
demonstrations and samples generated by the policy. The guide provides
additional rewards to the generator when demonstrated actions for specific
states are available. We compare AGAIL to other methods on benchmark tasks and
show that AGAIL consistently delivers comparable performance to the
state-of-the-art methods even when the action sequence in demonstrations is
only partially available.Comment: Accepted to International Joint Conference on Artificial Intelligence
(IJCAI-19
Robust learning from observation with model misspecification
Imitation learning (IL) is a popular paradigm for training policies in
robotic systems when specifying the reward function is difficult. However,
despite the success of IL algorithms, they impose the somewhat unrealistic
requirement that the expert demonstrations must come from the same domain in
which a new imitator policy is to be learned. We consider a practical setting,
where (i) state-only expert demonstrations from the real (deployment)
environment are given to the learner, (ii) the imitation learner policy is
trained in a simulation (training) environment whose transition dynamics is
slightly different from the real environment, and (iii) the learner does not
have any access to the real environment during the training phase beyond the
batch of demonstrations given. Most of the current IL methods, such as
generative adversarial imitation learning and its state-only variants, fail to
imitate the optimal expert behavior under the above setting. By leveraging
insights from the Robust reinforcement learning (RL) literature and building on
recent adversarial imitation approaches, we propose a robust IL algorithm to
learn policies that can effectively transfer to the real environment without
fine-tuning. Furthermore, we empirically demonstrate on continuous-control
benchmarks that our method outperforms the state-of-the-art state-only IL
method in terms of the zero-shot transfer performance in the real environment
and robust performance under different testing conditions.Comment: accepted to AAMAS 2022 (camera-ready version
Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation
Imitation learning is an effective approach for autonomous systems to acquire
control policies when an explicit reward function is unavailable, using
supervision provided as demonstrations from an expert, typically a human
operator. However, standard imitation learning methods assume that the agent
receives examples of observation-action tuples that could be provided, for
instance, to a supervised learning algorithm. This stands in contrast to how
humans and animals imitate: we observe another person performing some behavior
and then figure out which actions will realize that behavior, compensating for
changes in viewpoint, surroundings, object positions and types, and other
factors. We term this kind of imitation learning "imitation-from-observation,"
and propose an imitation learning method based on video prediction with context
translation and deep reinforcement learning. This lifts the assumption in
imitation learning that the demonstration should consist of observations in the
same environment configuration, and enables a variety of interesting
applications, including learning robotic skills that involve tool use simply by
observing videos of human tool use. Our experimental results show the
effectiveness of our approach in learning a wide range of real-world robotic
tasks modeled after common household chores from videos of a human
demonstrator, including sweeping, ladling almonds, pushing objects as well as a
number of tasks in simulation.Comment: Accepted at ICRA 2018, Brisbane. YuXuan Liu and Abhishek Gupta had
equal contributio
- …