135 research outputs found
Diverse Exploration via InfoMax Options
In this paper, we study the problem of autonomously discovering temporally
abstracted actions, or options, for exploration in reinforcement learning. For
learning diverse options suitable for exploration, we introduce the infomax
termination objective defined as the mutual information between options and
their corresponding state transitions. We derive a scalable optimization scheme
for maximizing this objective via the termination condition of options,
yielding the InfoMax Option Critic (IMOC) algorithm. Through illustrative
experiments, we empirically show that IMOC learns diverse options and utilizes
them for exploration. Moreover, we show that IMOC scales well to continuous
control tasks.Comment: Preprint. Under revie
- …