Bias-reduced Multi-step Hindsight Experience Replay for Efficient
  Multi-goal Reinforcement Learning

Li, Lanqing; Li, Xiu; Luo, Dijun; Luo, Feng; Lyu, Jiafei; Ya, Jiangpeng; Yang, Rui; Yang, Yu

Bias-reduced Multi-step Hindsight Experience Replay for Efficient Multi-goal Reinforcement Learning

Authors: Lanqing Li
Xiu Li
Dijun Luo
Feng Luo
Jiafei Lyu
Jiangpeng Ya
Rui Yang
Yu Yang
Publication date: 30 June 2021
Publisher

Abstract

Multi-goal reinforcement learning is widely applied in planning and robot manipulation. Two main challenges in multi-goal reinforcement learning are sparse rewards and sample inefficiency. Hindsight Experience Replay (HER) aims to tackle the two challenges via goal relabeling. However, HER-related works still need millions of samples and a huge computation. In this paper, we propose Multi-step Hindsight Experience Replay (MHER), incorporating multi-step relabeled returns based on

n

-step relabeling to improve sample efficiency. Despite the advantages of

n

-step relabeling, we theoretically and experimentally prove the off-policy

n

-step bias introduced by

n

-step relabeling may lead to poor performance in many environments. To address the above issue, two bias-reduced MHER algorithms, MHER(

\lambda

) and Model-based MHER (MMHER) are presented. MHER(

\lambda

) exploits the

\lambda

return while MMHER benefits from model-based value expansions. Experimental results on numerous multi-goal robotic tasks show that our solutions can successfully alleviate off-policy

n

-step bias and achieve significantly higher sample efficiency than HER and Curriculum-guided HER with little additional computation beyond HER.Comment: 20pages, 8 figure

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2102.12962

Last time updated on 02/03/2021