183,566 research outputs found
Learning Complicated Manipulation Skills via Deterministic Policy with Limited Demonstrations
Combined with demonstrations, deep reinforcement learning can efficiently
develop policies for manipulators. However, it takes time to collect sufficient
high-quality demonstrations in practice. And human demonstrations may be
unsuitable for robots. The non-Markovian process and over-reliance on
demonstrations are further challenges. For example, we found that RL agents are
sensitive to demonstration quality in manipulation tasks and struggle to adapt
to demonstrations directly from humans. Thus it is challenging to leverage
low-quality and insufficient demonstrations to assist reinforcement learning in
training better policies, and sometimes, limited demonstrations even lead to
worse performance.
We propose a new algorithm named TD3fG (TD3 learning from a generator) to
solve these problems. It forms a smooth transition from learning from experts
to learning from experience. This innovation can help agents extract prior
knowledge while reducing the detrimental effects of the demonstrations. Our
algorithm performs well in Adroit manipulator and MuJoCo tasks with limited
demonstrations
Meta Inverse Reinforcement Learning via Maximum Reward Sharing for Human Motion Analysis
This work handles the inverse reinforcement learning (IRL) problem where only
a small number of demonstrations are available from a demonstrator for each
high-dimensional task, insufficient to estimate an accurate reward function.
Observing that each demonstrator has an inherent reward for each state and the
task-specific behaviors mainly depend on a small number of key states, we
propose a meta IRL algorithm that first models the reward function for each
task as a distribution conditioned on a baseline reward function shared by all
tasks and dependent only on the demonstrator, and then finds the most likely
reward function in the distribution that explains the task-specific behaviors.
We test the method in a simulated environment on path planning tasks with
limited demonstrations, and show that the accuracy of the learned reward
function is significantly improved. We also apply the method to analyze the
motion of a patient under rehabilitation.Comment: arXiv admin note: text overlap with arXiv:1707.0939
Learning Generalizable Dexterous Manipulation from Human Grasp Affordance
Dexterous manipulation with a multi-finger hand is one of the most
challenging problems in robotics. While recent progress in imitation learning
has largely improved the sample efficiency compared to Reinforcement Learning,
the learned policy can hardly generalize to manipulate novel objects, given
limited expert demonstrations. In this paper, we propose to learn dexterous
manipulation using large-scale demonstrations with diverse 3D objects in a
category, which are generated from a human grasp affordance model. This
generalizes the policy to novel object instances within the same category. To
train the policy, we propose a novel imitation learning objective jointly with
a geometric representation learning objective using our demonstrations. By
experimenting with relocating diverse objects in simulation, we show that our
approach outperforms baselines with a large margin when manipulating novel
objects. We also ablate the importance on 3D object representation learning for
manipulation. We include videos, code, and additional information on the
project website - https://kristery.github.io/ILAD/ .Comment: project page: https://kristery.github.io/ILAD
- …