2 research outputs found
Learning Sparse Rewarded Tasks from Sub-Optimal Demonstrations
Model-free deep reinforcement learning (RL) has demonstrated its superiority
on many complex sequential decision-making problems. However, heavy dependence
on dense rewards and high sample-complexity impedes the wide adoption of these
methods in real-world scenarios. On the other hand, imitation learning (IL)
learns effectively in sparse-rewarded tasks by leveraging the existing expert
demonstrations. In practice, collecting a sufficient amount of expert
demonstrations can be prohibitively expensive, and the quality of
demonstrations typically limits the performance of the learning policy. In this
work, we propose Self-Adaptive Imitation Learning (SAIL) that can achieve
(near) optimal performance given only a limited number of sub-optimal
demonstrations for highly challenging sparse reward tasks. SAIL bridges the
advantages of IL and RL to reduce the sample complexity substantially, by
effectively exploiting sup-optimal demonstrations and efficiently exploring the
environment to surpass the demonstrated performance. Extensive empirical
results show that not only does SAIL significantly improve the
sample-efficiency but also leads to much better final performance across
different continuous control tasks, comparing to the state-of-the-art
Learning from Imperfect Demonstrations from Agents with Varying Dynamics
Imitation learning enables robots to learn from demonstrations. Previous
imitation learning algorithms usually assume access to optimal expert
demonstrations. However, in many real-world applications, this assumption is
limiting. Most collected demonstrations are not optimal or are produced by an
agent with slightly different dynamics. We therefore address the problem of
imitation learning when the demonstrations can be sub-optimal or be drawn from
agents with varying dynamics. We develop a metric composed of a feasibility
score and an optimality score to measure how useful a demonstration is for
imitation learning. The proposed score enables learning from more informative
demonstrations, and disregarding the less relevant demonstrations. Our
experiments on four environments in simulation and on a real robot show
improved learned policies with higher expected return.Comment: Accpeted by ICRA 202