Search CORE

5 research outputs found

Online Meta-learning by Parallel Algorithm Competition

Author: Baker James E.
Bertsekas D. P.
Downey Carlton
Gabillon V.
Goodfellow Ian
Mnih Volodymyr
Snoek Jasper
Snoek Jasper
Springenberg Jost T.
Sutton S.
Sutton S.
Szita I.
Unemi T.
Wu Jian
Publication venue
Publication date: 24/02/2017
Field of study

The efficiency of reinforcement learning algorithms depends critically on a few meta-parameters that modulates the learning updates and the trade-off between exploration and exploitation. The adaptation of the meta-parameters is an open question in reinforcement learning, which arguably has become more of an issue recently with the success of deep reinforcement learning in high-dimensional state spaces. The long learning times in domains such as Atari 2600 video games makes it not feasible to perform comprehensive searches of appropriate meta-parameter values. We propose the Online Meta-learning by Parallel Algorithm Competition (OMPAC) method. In the OMPAC method, several instances of a reinforcement learning algorithm are run in parallel with small differences in the initial values of the meta-parameters. After a fixed number of episodes, the instances are selected based on their performance in the task at hand. Before continuing the learning, Gaussian noise is added to the meta-parameters with a predefined probability. We validate the OMPAC method by improving the state-of-the-art results in stochastic SZ-Tetris and in standard Tetris with a smaller, 10

\times

10, board, by 31% and 84%, respectively, and by improving the results for deep Sarsa(

\lambda

) agents in three Atari 2600 games by 62% or more. The experiments also show the ability of the OMPAC method to adapt the meta-parameters according to the learning progress in different tasks.Comment: 15 pages, 10 figures. arXiv admin note: text overlap with arXiv:1702.0311

arXiv.org e-Print Archive

Crossref