Search CORE

7 research outputs found

Temporal difference learning for the game Tic-Tac-Toe 3D: Applying structure to neural networks

Author: Drugan Madalina M.
Van De Steeg Michiel
Wiering Marco
Publication venue: Institute of Electrical and Electronics Engineers Inc.
Publication date: 01/12/2015
Field of study

When reinforcement learning is applied to large state spaces, such as those occurring in playing board games, the use of a good function approximator to learn to approximate the value function is very important. In previous research, multi-layer perceptrons have often been quite successfully used as function approximator for learning to play particular games with temporal difference learning. With the recent developments in deep learning, it is important to study if using multiple hidden layers or particular network structures can help to improve learning the value function. In this paper, we compare five different structures of multilayer perceptrons for learning to play the game Tic-Tac-Toe 3D, both when training through self-play and when training against the same fixed opponent they are tested against. We compare three fully connected multilayer perceptrons with a different number of hidden layers and/or hidden units, as well as two structured ones. These structured multilayer perceptrons have a first hidden layer that is only sparsely connected to the input layer, and has units that correspond to the rows in Tic-Tac-Toe 3D. This allows them to more easily learn the contribution of specific patterns on the corresponding rows. One of the two structured multilayer perceptrons has a second hidden layer that is fully connected to the first one, which allows the neural network to learn to non-linearly integrate the information in these detected patterns. The results on Tic-Tac-Toe 3D show that the deep structured neural network with integrated pattern detectors has the strongest performance out of the compared multilayer perceptrons against a fixed opponent, both through self-training and through training against this fixed opponent

Proceedings - University of Groningen

Crossref

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Model-Free Deep Inverse Reinforcement Learning by Logistic Regression

Author: Eiji Uchibe
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 08/09/2017
Field of study

This paper proposes model-free deep inverse reinforcement learning to find nonlinear reward function structures. We formulate inverse reinforcement learning as a problem of density ratio estimation, and show that the log of the ratio between an optimal state transition and a baseline one is given by a part of reward and the difference of the value functions under the framework of linearly solvable Markov decision processes. The logarithm of density ratio is efficiently calculated by binomial logistic regression, of which the classifier is constructed by the reward and state value function. The classifier tries to discriminate between samples drawn from the optimal state transition probability and those from the baseline one. Then, the estimated state value function is used to initialize the part of the deep neural networks for forward reinforcement learning. The proposed deep forward and inverse reinforcement learning is applied into two benchmark games: Atari 2600 and Reversi. Simulation results show that our method reaches the best performance substantially faster than the standard combination of forward and inverse reinforcement learning as well as behavior cloning

OIST Institutional Repository

Institutional Repositories DataBase (IRDB)

Neural-Fitted TD-Leaf Learning for Playing Othello With Structured Neural Networks

Author: van den Dries Sjoerd
Wiering Marco A.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/11/2012
Field of study

This paper describes a methodology for quickly learning to play games at a strong level. The methodology consists of a novel combination of three techniques, and a variety of experiments on the game of Othello demonstrates their usefulness. First, structures or topologies in neural network connectivity patterns are used to decrease the number of learning parameters and to deal more effectively with the structural credit assignment problem, which is to change individual network weights based on the obtained feedback. Furthermore, the structured neural networks are trained with the novel neural-fitted temporal difference (TD) learning algorithm to create a system that can exploit most of the training experiences and enhance learning speed and performance. Finally, we use the neural-fitted TD-leaf algorithm to learn more effectively when look-ahead search is performed by the game-playing program. Our extensive experimental study clearly indicates that the proposed method outperforms linear networks and fully connected neural networks or evaluation functions evolved with evolutionary algorithms