3,233 research outputs found
Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning
To rapidly learn a new task, it is often essential for agents to explore
efficiently -- especially when performance matters from the first timestep. One
way to learn such behaviour is via meta-learning. Many existing methods however
rely on dense rewards for meta-training, and can fail catastrophically if the
rewards are sparse. Without a suitable reward signal, the need for exploration
during meta-training is exacerbated. To address this, we propose HyperX, which
uses novel reward bonuses for meta-training to explore in approximate
hyper-state space (where hyper-states represent the environment state and the
agent's task belief). We show empirically that HyperX meta-learns better
task-exploration and adapts more successfully to new tasks than existing
methods.Comment: Published at the International Conference on Machine Learning (ICML)
202
Learning Representations in Model-Free Hierarchical Reinforcement Learning
Common approaches to Reinforcement Learning (RL) are seriously challenged by
large-scale applications involving huge state spaces and sparse delayed reward
feedback. Hierarchical Reinforcement Learning (HRL) methods attempt to address
this scalability issue by learning action selection policies at multiple levels
of temporal abstraction. Abstraction can be had by identifying a relatively
small set of states that are likely to be useful as subgoals, in concert with
the learning of corresponding skill policies to achieve those subgoals. Many
approaches to subgoal discovery in HRL depend on the analysis of a model of the
environment, but the need to learn such a model introduces its own problems of
scale. Once subgoals are identified, skills may be learned through intrinsic
motivation, introducing an internal reward signal marking subgoal attainment.
In this paper, we present a novel model-free method for subgoal discovery using
incremental unsupervised learning over a small memory of the most recent
experiences (trajectories) of the agent. When combined with an intrinsic
motivation learning mechanism, this method learns both subgoals and skills,
based on experiences in the environment. Thus, we offer an original approach to
HRL that does not require the acquisition of a model of the environment,
suitable for large-scale applications. We demonstrate the efficiency of our
method on two RL problems with sparse delayed feedback: a variant of the rooms
environment and the first screen of the ATARI 2600 Montezuma's Revenge game
- …