Goal-Conditioned Hierarchical Reinforcement Learning (GCHRL) is a promising
paradigm to address the exploration-exploitation dilemma in reinforcement
learning. It decomposes the source task into subgoal conditional subtasks and
conducts exploration and exploitation in the subgoal space. The effectiveness
of GCHRL heavily relies on subgoal representation functions and subgoal
selection strategy. However, existing works often overlook the temporal
coherence in GCHRL when learning latent subgoal representations and lack an
efficient subgoal selection strategy that balances exploration and
exploitation. This paper proposes HIerarchical reinforcement learning via
dynamically building Latent Landmark graphs (HILL) to overcome these
limitations. HILL learns latent subgoal representations that satisfy temporal
coherence using a contrastive representation learning objective. Based on these
representations, HILL dynamically builds latent landmark graphs and employs a
novelty measure on nodes and a utility measure on edges. Finally, HILL develops
a subgoal selection strategy that balances exploration and exploitation by
jointly considering both measures. Experimental results demonstrate that HILL
outperforms state-of-the-art baselines on continuous control tasks with sparse
rewards in sample efficiency and asymptotic performance. Our code is available
at https://github.com/papercode2022/HILL.Comment: Accepted by the conference of International Joint Conference on
Neural Networks (IJCNN) 202