7 research outputs found

    Temporal difference learning for the game Tic-Tac-Toe 3D: Applying structure to neural networks

    Get PDF
    When reinforcement learning is applied to large state spaces, such as those occurring in playing board games, the use of a good function approximator to learn to approximate the value function is very important. In previous research, multi-layer perceptrons have often been quite successfully used as function approximator for learning to play particular games with temporal difference learning. With the recent developments in deep learning, it is important to study if using multiple hidden layers or particular network structures can help to improve learning the value function. In this paper, we compare five different structures of multilayer perceptrons for learning to play the game Tic-Tac-Toe 3D, both when training through self-play and when training against the same fixed opponent they are tested against. We compare three fully connected multilayer perceptrons with a different number of hidden layers and/or hidden units, as well as two structured ones. These structured multilayer perceptrons have a first hidden layer that is only sparsely connected to the input layer, and has units that correspond to the rows in Tic-Tac-Toe 3D. This allows them to more easily learn the contribution of specific patterns on the corresponding rows. One of the two structured multilayer perceptrons has a second hidden layer that is fully connected to the first one, which allows the neural network to learn to non-linearly integrate the information in these detected patterns. The results on Tic-Tac-Toe 3D show that the deep structured neural network with integrated pattern detectors has the strongest performance out of the compared multilayer perceptrons against a fixed opponent, both through self-training and through training against this fixed opponent

    Model-Free Deep Inverse Reinforcement Learning by Logistic Regression

    Get PDF
    This paper proposes model-free deep inverse reinforcement learning to find nonlinear reward function structures. We formulate inverse reinforcement learning as a problem of density ratio estimation, and show that the log of the ratio between an optimal state transition and a baseline one is given by a part of reward and the difference of the value functions under the framework of linearly solvable Markov decision processes. The logarithm of density ratio is efficiently calculated by binomial logistic regression, of which the classifier is constructed by the reward and state value function. The classifier tries to discriminate between samples drawn from the optimal state transition probability and those from the baseline one. Then, the estimated state value function is used to initialize the part of the deep neural networks for forward reinforcement learning. The proposed deep forward and inverse reinforcement learning is applied into two benchmark games: Atari 2600 and Reversi. Simulation results show that our method reaches the best performance substantially faster than the standard combination of forward and inverse reinforcement learning as well as behavior cloning

    Neural-Fitted TD-Leaf Learning for Playing Othello With Structured Neural Networks

    No full text
    This paper describes a methodology for quickly learning to play games at a strong level. The methodology consists of a novel combination of three techniques, and a variety of experiments on the game of Othello demonstrates their usefulness. First, structures or topologies in neural network connectivity patterns are used to decrease the number of learning parameters and to deal more effectively with the structural credit assignment problem, which is to change individual network weights based on the obtained feedback. Furthermore, the structured neural networks are trained with the novel neural-fitted temporal difference (TD) learning algorithm to create a system that can exploit most of the training experiences and enhance learning speed and performance. Finally, we use the neural-fitted TD-leaf algorithm to learn more effectively when look-ahead search is performed by the game-playing program. Our extensive experimental study clearly indicates that the proposed method outperforms linear networks and fully connected neural networks or evaluation functions evolved with evolutionary algorithms

    Neural-Fitted TD-Leaf Learning for Playing Othello With Structured Neural Networks

    No full text
    corecore