1,128 research outputs found
Learning Representations in Model-Free Hierarchical Reinforcement Learning
Common approaches to Reinforcement Learning (RL) are seriously challenged by
large-scale applications involving huge state spaces and sparse delayed reward
feedback. Hierarchical Reinforcement Learning (HRL) methods attempt to address
this scalability issue by learning action selection policies at multiple levels
of temporal abstraction. Abstraction can be had by identifying a relatively
small set of states that are likely to be useful as subgoals, in concert with
the learning of corresponding skill policies to achieve those subgoals. Many
approaches to subgoal discovery in HRL depend on the analysis of a model of the
environment, but the need to learn such a model introduces its own problems of
scale. Once subgoals are identified, skills may be learned through intrinsic
motivation, introducing an internal reward signal marking subgoal attainment.
In this paper, we present a novel model-free method for subgoal discovery using
incremental unsupervised learning over a small memory of the most recent
experiences (trajectories) of the agent. When combined with an intrinsic
motivation learning mechanism, this method learns both subgoals and skills,
based on experiences in the environment. Thus, we offer an original approach to
HRL that does not require the acquisition of a model of the environment,
suitable for large-scale applications. We demonstrate the efficiency of our
method on two RL problems with sparse delayed feedback: a variant of the rooms
environment and the first screen of the ATARI 2600 Montezuma's Revenge game
Searching for rewards in graph-structured spaces
How do people generalize and explore structured spaces? We study human behavior on a multi-armed bandit task, where rewards are influenced by the connectivity structure of a graph. A detailed predictive model comparison shows that a Gaussian Process regression model using a diffusion kernel is able to best describe participant choices, and also predict judgments about expected reward and confidence. This model unifies psychological models of function learning with the Successor Representation used in reinforcement learning, thereby building a bridge between different models of generalization
Deep Laplacian-based Options for Temporally-Extended Exploration
Selecting exploratory actions that generate a rich stream of experience for
better learning is a fundamental challenge in reinforcement learning (RL). An
approach to tackle this problem consists in selecting actions according to
specific policies for an extended period of time, also known as options. A
recent line of work to derive such exploratory options builds upon the
eigenfunctions of the graph Laplacian. Importantly, until now these methods
have been mostly limited to tabular domains where (1) the graph Laplacian
matrix was either given or could be fully estimated, (2) performing
eigendecomposition on this matrix was computationally tractable, and (3) value
functions could be learned exactly. Additionally, these methods required a
separate option discovery phase. These assumptions are fundamentally not
scalable. In this paper we address these limitations and show how recent
results for directly approximating the eigenfunctions of the Laplacian can be
leveraged to truly scale up options-based exploration. To do so, we introduce a
fully online deep RL algorithm for discovering Laplacian-based options and
evaluate our approach on a variety of pixel-based tasks. We compare to several
state-of-the-art exploration methods and show that our approach is effective,
general, and especially promising in non-stationary settings
- …