Deep reinforcement learning (RL) has achieved remarkable success in solving
complex tasks through its integration with deep neural networks (DNNs) as
function approximators. However, the reliance on DNNs has introduced a new
challenge called primacy bias, whereby these function approximators tend to
prioritize early experiences, leading to overfitting. To mitigate this primacy
bias, a reset method has been proposed, which performs periodic resets of a
portion or the entirety of a deep RL agent while preserving the replay buffer.
However, the use of the reset method can result in performance collapses after
executing the reset, which can be detrimental from the perspective of safe RL
and regret minimization. In this paper, we propose a new reset-based method
that leverages deep ensemble learning to address the limitations of the vanilla
reset method and enhance sample efficiency. The proposed method is evaluated
through various experiments including those in the domain of safe RL. Numerical
results show its effectiveness in high sample efficiency and safety
considerations.Comment: NeurIPS 2023 camera-read