Search CORE

7 research outputs found

Learning to Play Othello with N-Tuple Systems

Author: Lucas Simon M
Publication venue
Publication date: 01/01/2008
Field of study

This paper investigates the use of n-tuple systems as position value functions for the game of Othello. The architecture is described, and then evaluated for use with temporal difference learning. Performance is compared with previously de-veloped weighted piece counters and multi-layer perceptrons. The n-tuple system is able to defeat the best performing of these after just five hundred games of self-play learning. The conclusion is that n-tuple networks learn faster and better than the other more conventional approaches

University of Essex Research Repository

CiteSeerX

Evolving Players for an Ancient Game: Hnefatafl

Author: Hingston Philip
Publication venue: Edith Cowan University, Research Online, Perth, Western Australia
Publication date: 01/01/2007
Field of study

Hnefatafl is an ancient Norse game - an ancestor of chess. In this paper, we report on the development of computer players for this game. In the spirit of Blondie24, we evolve neural networks as board evaluation functions for different versions of the game. An unusual aspect of this game is that there is no general agreement on the rules: it is no longer much played, and game historians attempt to infer the rules from scraps of historical texts, with ambiguities often resolved on gut feeling as to what the rules must have been in order to achieve a balanced game. We offer the evolutionary method as a means by which to judge the merits of alternative rule set

Crossref

Research Online @ ECU

Temporal Difference Learning Versus Co-Evolution for Acquiring Othello Position Evaluation

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Reinforcement Learning in the Game of Othello: Learning Against a Fixed Opponent and Learning from Self-Play

Author: van der Ree Michiel
Wiering Marco
Publication venue
Publication date: 01/01/2013
Field of study

This paper compares three strategies in using reinforcement learning algorithms to let an artificial agent learnto play the game of Othello. The three strategies that are compared are: Learning by self-play, learning from playing against a fixed opponent, and learning from playing against a fixed opponent while learning from the opponent’s moves as well.These issues are considered for the algorithms Q-learning, Sarsa and TD-learning. These three reinforcement learning algorithms are combined with multi-layer perceptrons and trained and tested against three fixed opponents. It is found that the best strategy of learning differs per algorithm. Q-learning and Sarsa perform best when trained against the fixed opponent they arealso tested against, whereas TD-learning performs best when trained through self-play. Surprisingly, Q-learning and Sarsa outperform TD-learning against the stronger fixed opponents, when all methods use their best strategy. Learning from the opponent’s moves as well leads to worse results compared to learning only from the learning agent’s own moves

Crossref

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Model-Free Deep Inverse Reinforcement Learning by Logistic Regression

Author: Eiji Uchibe
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 08/09/2017
Field of study

This paper proposes model-free deep inverse reinforcement learning to find nonlinear reward function structures. We formulate inverse reinforcement learning as a problem of density ratio estimation, and show that the log of the ratio between an optimal state transition and a baseline one is given by a part of reward and the difference of the value functions under the framework of linearly solvable Markov decision processes. The logarithm of density ratio is efficiently calculated by binomial logistic regression, of which the classifier is constructed by the reward and state value function. The classifier tries to discriminate between samples drawn from the optimal state transition probability and those from the baseline one. Then, the estimated state value function is used to initialize the part of the deep neural networks for forward reinforcement learning. The proposed deep forward and inverse reinforcement learning is applied into two benchmark games: Atari 2600 and Reversi. Simulation results show that our method reaches the best performance substantially faster than the standard combination of forward and inverse reinforcement learning as well as behavior cloning

OIST Institutional Repository

Institutional Repositories DataBase (IRDB)