1 research outputs found
Exploring Unknown States with Action Balance
Exploration is a key problem in reinforcement learning. Recently bonus-based
methods have achieved considerable successes in environments where exploration
is difficult such as Montezuma's Revenge, which assign additional bonuses
(e.g., intrinsic rewards) to guide the agent to rarely visited states. Since
the bonus is calculated according to the novelty of the next state after
performing an action, we call such methods as the next-state bonus methods.
However, the next-state bonus methods force the agent to pay overmuch attention
in exploring known states and ignore finding unknown states since the
exploration is driven by the next state already visited, which may slow the
pace of finding reward in some environments. In this paper, we focus on
improving the effectiveness of finding unknown states and propose action
balance exploration, which balances the frequency of selecting each action at a
given state and can be treated as an extension of upper confidence bound (UCB)
to deep reinforcement learning. Moreover, we propose action balance RND that
combines the next-state bonus methods (e.g., random network distillation
exploration, RND) and our action balance exploration to take advantage of both
sides. The experiments on the grid world and Atari games demonstrate action
balance exploration has a better capability in finding unknown states and can
improve the performance of RND in some hard exploration environments
respectively