Search CORE

630 research outputs found

Online Meta-learning by Parallel Algorithm Competition

Author: Baker James E.
Bertsekas D. P.
Downey Carlton
Gabillon V.
Goodfellow Ian
Mnih Volodymyr
Snoek Jasper
Snoek Jasper
Springenberg Jost T.
Sutton S.
Sutton S.
Szita I.
Unemi T.
Wu Jian
Publication venue
Publication date: 24/02/2017
Field of study

The efficiency of reinforcement learning algorithms depends critically on a few meta-parameters that modulates the learning updates and the trade-off between exploration and exploitation. The adaptation of the meta-parameters is an open question in reinforcement learning, which arguably has become more of an issue recently with the success of deep reinforcement learning in high-dimensional state spaces. The long learning times in domains such as Atari 2600 video games makes it not feasible to perform comprehensive searches of appropriate meta-parameter values. We propose the Online Meta-learning by Parallel Algorithm Competition (OMPAC) method. In the OMPAC method, several instances of a reinforcement learning algorithm are run in parallel with small differences in the initial values of the meta-parameters. After a fixed number of episodes, the instances are selected based on their performance in the task at hand. Before continuing the learning, Gaussian noise is added to the meta-parameters with a predefined probability. We validate the OMPAC method by improving the state-of-the-art results in stochastic SZ-Tetris and in standard Tetris with a smaller, 10

\times

10, board, by 31% and 84%, respectively, and by improving the results for deep Sarsa(

\lambda

) agents in three Atari 2600 games by 62% or more. The experiments also show the ability of the OMPAC method to adapt the meta-parameters according to the learning progress in different tasks.Comment: 15 pages, 10 figures. arXiv admin note: text overlap with arXiv:1702.0311

arXiv.org e-Print Archive

Crossref

Genetic Network Programming with Reinforcement Learning and Its Application to Creating Stock Trading Rules

Author: Kotaro Hirasawa
Shingo Mabu
Yan Chen
Publication venue: 'IntechOpen'
Publication date: 01/01/2009
Field of study

IntechOpen

Reinforcement Learning for the Unit Commitment Problem

Author: Dalal Gal
Mannor Shie
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/07/2015
Field of study

In this work we solve the day-ahead unit commitment (UC) problem, by formulating it as a Markov decision process (MDP) and finding a low-cost policy for generation scheduling. We present two reinforcement learning algorithms, and devise a third one. We compare our results to previous work that uses simulated annealing (SA), and show a 27% improvement in operation costs, with running time of 2.5 minutes (compared to 2.5 hours of existing state-of-the-art).Comment: Accepted and presented in IEEE PES PowerTech, Eindhoven 2015, paper ID 46273

arXiv.org e-Print Archive

Crossref

Reinforcement learning in continuous state- and action-space

Author: Nichols B.D.
Nichols B.D.
Publication venue
Publication date: 01/01/2014
Field of study

Reinforcement learning in the continuous state-space poses the problem of the inability to store the values of all state-action pairs in a lookup table, due to both storage limitations and the inability to visit all states sufficiently often to learn the correct values. This can be overcome with the use of function approximation techniques with generalisation capability, such as artificial neural networks, to store the value function. When this is applied we can select the optimal action by comparing the values of each possible action; however, when the action-space is continuous this is not possible. In this thesis we investigate methods to select the optimal action when artificial neural networks are used to approximate the value function, through the application of numerical optimization techniques. Although it has been stated in the literature that gradient-ascent methods can be applied to the action selection [47], it is also stated that solving this problem would be infeasible, and therefore, is claimed that it is necessary to utilise a second artificial neural network to approximate the policy function [21, 55]. The major contributions of this thesis include the investigation of the applicability of action selection by numerical optimization methods, including gradient-ascent along with other derivative-based and derivative-free numerical optimization methods,and the proposal of two novel algorithms which are based on the application of two alternative action selection methods: NM-SARSA [40] and NelderMead-SARSA. We empirically compare the proposed methods to state-of-the-art methods from the literature on three continuous state- and action-space control benchmark problems from the literature: minimum-time full swing-up of the Acrobot; Cart-Pole balancing problem; and a double pole variant. We also present novel results from the application of the existing direct policy search method genetic programming to the Acrobot benchmark problem [12, 14]

WestminsterResearch

Study on stock trading and portfolio optimization using genetic network programming

Author: Chen Yan
Publication venue
Publication date: 01/01/2010
Field of study

制度:新 ; 報告番号:甲3002号 ; 学位の種類:博士(工学) ; 授与年月日: 2010/3/15 ; 早大学位記番号:新525

Waseda University Repository

Exploration of genetic network programming with two-stage reinforcement learning for mobile robot

Author: Afandi Arif Nur
Hirasawa Kotaro
Lin Hsien-I
Mahandi Yogi Dwi
Sendari Siti
Zaeni Ilham Ari Elbaith
Publication venue: 'Universitas Ahmad Dahlan'
Publication date: 01/06/2019
Field of study

This paper observes the exploration of Genetic Network Programming Two-Stage Reinforcement Learning for mobile robot navigation. The proposed method aims to observe its exploration when inexperienced environments used in the implementation. In order to deal with this situation, individuals are trained firstly in the training phase, that is, they learn the environment with ϵ-greedy policy and learning rate α parameters. Here, two cases are studied, i.e., case A for low exploration and case B for high exploration. In the implementation, the individuals implemented to get experience and learn a new environment on-line. Then, the performance of learning processes are observed due to the environmental changes

TELKOMNIKA (Telecommunication Computing Electronics and Control)

UAD Journal Management System