Multi-goal reinforcement learning is widely applied in planning and robot
manipulation. Two main challenges in multi-goal reinforcement learning are
sparse rewards and sample inefficiency. Hindsight Experience Replay (HER) aims
to tackle the two challenges via goal relabeling. However, HER-related works
still need millions of samples and a huge computation. In this paper, we
propose Multi-step Hindsight Experience Replay (MHER), incorporating multi-step
relabeled returns based on n-step relabeling to improve sample efficiency.
Despite the advantages of n-step relabeling, we theoretically and
experimentally prove the off-policy n-step bias introduced by n-step
relabeling may lead to poor performance in many environments. To address the
above issue, two bias-reduced MHER algorithms, MHER(位) and Model-based
MHER (MMHER) are presented. MHER(位) exploits the 位 return while
MMHER benefits from model-based value expansions. Experimental results on
numerous multi-goal robotic tasks show that our solutions can successfully
alleviate off-policy n-step bias and achieve significantly higher sample
efficiency than HER and Curriculum-guided HER with little additional
computation beyond HER.Comment: 20pages, 8 figure