Search CORE

604 research outputs found

Preference Learning for Move Prediction and Evaluation Function Approximation in Othello

Author: Lucas Simon M
Runarsson Thomas Philip
Publication venue: Institute of Electrical and Electronics Engineers (IEEE)
Publication date: 11/03/2014
Field of study

This paper investigates the use of preference learning as an approach to move prediction and evaluation function approximation, using the game of Othello as a test domain. Using the same sets of features, we compare our approach with least squares temporal difference learning, direct classification, and with the Bradley-Terry model, fitted using minorization-maximization (MM). The results show that the exact way in which preference learning is applied is critical to achieving high performance. Best results were obtained using a combination of board inversion and pair-wise preference learning. This combination significantly outperformed the others under test, both in terms of move prediction accuracy, and in the level of play achieved when using the learned evaluation function as a move selector during game play

University of Essex Research Repository

Queen Mary Research Online

A Survey of Monte Carlo Tree Search Methods

Author: Browne Cameron B
Colton Simon
Cowling Peter I
Lucas Simon M
Perez Diego
Powley Edward
Rohlfshagen Philipp
Samothrakis Spyridon
Tavener Stephen
Whitehouse Daniel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

Monte Carlo tree search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarize the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work

University of Essex Research Repository

CiteSeerX

Maastricht University Research Portal

Learning Policies from Self-Play with Policy Gradients and MCTS Value Estimates

Author: Browne Cameron
Piette Éric
Soemers Dennis J. N. J.
Stephenson Matthew
Publication venue
Publication date: 01/01/2019
Field of study

In recent years, state-of-the-art game-playing agents often involve policies that are trained in self-playing processes where Monte Carlo tree search (MCTS) algorithms and trained policies iteratively improve each other. The strongest results have been obtained when policies are trained to mimic the search behaviour of MCTS by minimising a cross-entropy loss. Because MCTS, by design, includes an element of exploration, policies trained in this manner are also likely to exhibit a similar extent of exploration. In this paper, we are interested in learning policies for a project with future goals including the extraction of interpretable strategies, rather than state-of-the-art game-playing performance. For these goals, we argue that such an extent of exploration is undesirable, and we propose a novel objective function for training policies that are not exploratory. We derive a policy gradient expression for maximising this objective function, which can be estimated using MCTS value estimates, rather than MCTS visit counts. We empirically evaluate various properties of resulting policies, in a variety of board games.Comment: Accepted at the IEEE Conference on Games (CoG) 201

arXiv.org e-Print Archive

Maastricht University Research Portal

Crossref

DIAL UCLouvain

Temporal Difference Learning Versus Co-Evolution for Acquiring Othello Position Evaluation

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Menjana pemodulatan lebar denyut (PWM) penyongsang tiga fasa menggunakan pemproses isyarat digital (DSP)

Author: Hashim Nor Hasyiemah
Publication venue
Publication date: 01/01/2013
Field of study

Baru-baru ini, penyongsang digunakan secara meluas dalam aplikasi industri. Walaubagaimanapun, teknik Pemodulatan Lebar Denyut (PWM) diperlukan untuk mengawal voltan keluaran dan frekuensi penyongsang. Dalam tesis ini, untuk Pemodulatan Lebar Denyut Sinus Unipolar (SPWM) penyongsang tiga fasa adalah dicadang menggunakan Pemproses Isyarat Digital (DSP). Satu model simulasi menggunakan MATLAB Simulink dibangunkan untuk menentukan program Pemodulatan Lebar Denyut Sinus Unipolar (SPWM) Program ini kemudian dibangunkan dalam Pemproses Isyarat Digital (DSP) TMS320f28335. Hasilnya menunjukkan bahawa voltan keluaran penyongsang tiga fasa boleh dikendalikan

UTHM Institutional Repository