223 research outputs found
Large-Scale Study of Curiosity-Driven Learning
Reinforcement learning algorithms rely on carefully engineering environment
rewards that are extrinsic to the agent. However, annotating each environment
with hand-designed, dense rewards is not scalable, motivating the need for
developing reward functions that are intrinsic to the agent. Curiosity is a
type of intrinsic reward function which uses prediction error as reward signal.
In this paper: (a) We perform the first large-scale study of purely
curiosity-driven learning, i.e. without any extrinsic rewards, across 54
standard benchmark environments, including the Atari game suite. Our results
show surprisingly good performance, and a high degree of alignment between the
intrinsic curiosity objective and the hand-designed extrinsic rewards of many
game environments. (b) We investigate the effect of using different feature
spaces for computing prediction error and show that random features are
sufficient for many popular RL game benchmarks, but learned features appear to
generalize better (e.g. to novel game levels in Super Mario Bros.). (c) We
demonstrate limitations of the prediction-based rewards in stochastic setups.
Game-play videos and code are at
https://pathak22.github.io/large-scale-curiosity/Comment: First three authors contributed equally and ordered alphabetically.
Website at https://pathak22.github.io/large-scale-curiosity
Flow-based Intrinsic Curiosity Module
In this paper, we focus on a prediction-based novelty estimation strategy
upon the deep reinforcement learning (DRL) framework, and present a flow-based
intrinsic curiosity module (FICM) to exploit the prediction errors from optical
flow estimation as exploration bonuses. We propose the concept of leveraging
motion features captured between consecutive observations to evaluate the
novelty of observations in an environment. FICM encourages a DRL agent to
explore observations with unfamiliar motion features, and requires only two
consecutive frames to obtain sufficient information when estimating the
novelty. We evaluate our method and compare it with a number of existing
methods on multiple benchmark environments, including Atari games, Super Mario
Bros., and ViZDoom. We demonstrate that FICM is favorable to tasks or
environments featuring moving objects, which allow FICM to utilize the motion
features between consecutive observations. We further ablatively analyze the
encoding efficiency of FICM, and discuss its applicable domains
comprehensively.Comment: The SOLE copyright holder is IJCAI (International Joint Conferences
on Artificial Intelligence), all rights reserved. The link is provided as
follows: https://www.ijcai.org/Proceedings/2020/28
Practical Deep Reinforcement Learning Approach for Stock Trading
Stock trading strategy plays a crucial role in investment companies. However,
it is challenging to obtain optimal strategy in the complex and dynamic stock
market. We explore the potential of deep reinforcement learning to optimize
stock trading strategy and thus maximize investment return. 30 stocks are
selected as our trading stocks and their daily prices are used as the training
and trading market environment. We train a deep reinforcement learning agent
and obtain an adaptive trading strategy. The agent's performance is evaluated
and compared with Dow Jones Industrial Average and the traditional min-variance
portfolio allocation strategy. The proposed deep reinforcement learning
approach is shown to outperform the two baselines in terms of both the Sharpe
ratio and cumulative returns
Generative Exploration and Exploitation
Sparse reward is one of the biggest challenges in reinforcement learning
(RL). In this paper, we propose a novel method called Generative Exploration
and Exploitation (GENE) to overcome sparse reward. GENE automatically generates
start states to encourage the agent to explore the environment and to exploit
received reward signals. GENE can adaptively tradeoff between exploration and
exploitation according to the varying distributions of states experienced by
the agent as the learning progresses. GENE relies on no prior knowledge about
the environment and can be combined with any RL algorithm, no matter on-policy
or off-policy, single-agent or multi-agent. Empirically, we demonstrate that
GENE significantly outperforms existing methods in three tasks with only binary
rewards, including Maze, Maze Ant, and Cooperative Navigation. Ablation studies
verify the emergence of progressive exploration and automatic reversing.Comment: AAAI'2
- …