1 research outputs found
Keeping Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards
While using shaped rewards can be beneficial when solving sparse reward
tasks, their successful application often requires careful engineering and is
problem specific. For instance, in tasks where the agent must achieve some goal
state, simple distance-to-goal reward shaping often fails, as it renders
learning vulnerable to local optima. We introduce a simple and effective
model-free method to learn from shaped distance-to-goal rewards on tasks where
success depends on reaching a goal state. Our method introduces an auxiliary
distance-based reward based on pairs of rollouts to encourage diverse
exploration. This approach effectively prevents learning dynamics from
stabilizing around local optima induced by the naive distance-to-goal reward
shaping and enables policies to efficiently solve sparse reward tasks. Our
augmented objective does not require any additional reward engineering or
domain expertise to implement and converges to the original sparse objective as
the agent learns to solve the task. We demonstrate that our method successfully
solves a variety of hard-exploration tasks (including maze navigation and 3D
construction in a Minecraft environment), where naive distance-based reward
shaping otherwise fails, and intrinsic curiosity and reward relabeling
strategies exhibit poor performance.Comment: NeurIPS 201