46,274 research outputs found
Flow-based Intrinsic Curiosity Module
In this paper, we focus on a prediction-based novelty estimation strategy
upon the deep reinforcement learning (DRL) framework, and present a flow-based
intrinsic curiosity module (FICM) to exploit the prediction errors from optical
flow estimation as exploration bonuses. We propose the concept of leveraging
motion features captured between consecutive observations to evaluate the
novelty of observations in an environment. FICM encourages a DRL agent to
explore observations with unfamiliar motion features, and requires only two
consecutive frames to obtain sufficient information when estimating the
novelty. We evaluate our method and compare it with a number of existing
methods on multiple benchmark environments, including Atari games, Super Mario
Bros., and ViZDoom. We demonstrate that FICM is favorable to tasks or
environments featuring moving objects, which allow FICM to utilize the motion
features between consecutive observations. We further ablatively analyze the
encoding efficiency of FICM, and discuss its applicable domains
comprehensively.Comment: The SOLE copyright holder is IJCAI (International Joint Conferences
on Artificial Intelligence), all rights reserved. The link is provided as
follows: https://www.ijcai.org/Proceedings/2020/28
Combining Experience Replay with Exploration by Random Network Distillation
Our work is a simple extension of the paper "Exploration by Random Network
Distillation". More in detail, we show how to efficiently combine Intrinsic
Rewards with Experience Replay in order to achieve more efficient and robust
exploration (with respect to PPO/RND) and consequently better results in terms
of agent performances and sample efficiency. We are able to do it by using a
new technique named Prioritized Oversampled Experience Replay (POER), that has
been built upon the definition of what is the important experience useful to
replay. Finally, we evaluate our technique on the famous Atari game Montezuma's
Revenge and some other hard exploration Atari games.Comment: 8 pages, 6 figures, accepted as full-paper at IEEE Conference on
Games (CoG) 201
Learning Representations in Model-Free Hierarchical Reinforcement Learning
Common approaches to Reinforcement Learning (RL) are seriously challenged by
large-scale applications involving huge state spaces and sparse delayed reward
feedback. Hierarchical Reinforcement Learning (HRL) methods attempt to address
this scalability issue by learning action selection policies at multiple levels
of temporal abstraction. Abstraction can be had by identifying a relatively
small set of states that are likely to be useful as subgoals, in concert with
the learning of corresponding skill policies to achieve those subgoals. Many
approaches to subgoal discovery in HRL depend on the analysis of a model of the
environment, but the need to learn such a model introduces its own problems of
scale. Once subgoals are identified, skills may be learned through intrinsic
motivation, introducing an internal reward signal marking subgoal attainment.
In this paper, we present a novel model-free method for subgoal discovery using
incremental unsupervised learning over a small memory of the most recent
experiences (trajectories) of the agent. When combined with an intrinsic
motivation learning mechanism, this method learns both subgoals and skills,
based on experiences in the environment. Thus, we offer an original approach to
HRL that does not require the acquisition of a model of the environment,
suitable for large-scale applications. We demonstrate the efficiency of our
method on two RL problems with sparse delayed feedback: a variant of the rooms
environment and the first screen of the ATARI 2600 Montezuma's Revenge game
- …