604 research outputs found
Preference Learning for Move Prediction and Evaluation Function Approximation in Othello
This paper investigates the use of preference learning as an approach to move prediction and evaluation function approximation, using the game of Othello as a test domain. Using the same sets of features, we compare our approach with least squares temporal difference learning, direct classification, and with the Bradley-Terry model, fitted using minorization-maximization (MM). The results show that the exact way in which preference learning is applied is critical to achieving high performance. Best results were obtained using a combination of board inversion and pair-wise preference learning. This combination significantly outperformed the others under test, both in terms of move prediction accuracy, and in the level of play achieved when using the learned evaluation function as a move selector during game play
A Survey of Monte Carlo Tree Search Methods
Monte Carlo tree search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarize the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work
Learning Policies from Self-Play with Policy Gradients and MCTS Value Estimates
In recent years, state-of-the-art game-playing agents often involve policies
that are trained in self-playing processes where Monte Carlo tree search (MCTS)
algorithms and trained policies iteratively improve each other. The strongest
results have been obtained when policies are trained to mimic the search
behaviour of MCTS by minimising a cross-entropy loss. Because MCTS, by design,
includes an element of exploration, policies trained in this manner are also
likely to exhibit a similar extent of exploration. In this paper, we are
interested in learning policies for a project with future goals including the
extraction of interpretable strategies, rather than state-of-the-art
game-playing performance. For these goals, we argue that such an extent of
exploration is undesirable, and we propose a novel objective function for
training policies that are not exploratory. We derive a policy gradient
expression for maximising this objective function, which can be estimated using
MCTS value estimates, rather than MCTS visit counts. We empirically evaluate
various properties of resulting policies, in a variety of board games.Comment: Accepted at the IEEE Conference on Games (CoG) 201
Menjana pemodulatan lebar denyut (PWM) penyongsang tiga fasa menggunakan pemproses isyarat digital (DSP)
Baru-baru ini, penyongsang digunakan secara meluas dalam aplikasi industri.
Walaubagaimanapun, teknik Pemodulatan Lebar Denyut (PWM) diperlukan untuk
mengawal voltan keluaran dan frekuensi penyongsang. Dalam tesis ini, untuk
Pemodulatan Lebar Denyut Sinus Unipolar (SPWM) penyongsang tiga fasa adalah
dicadang menggunakan Pemproses Isyarat Digital (DSP). Satu model simulasi
menggunakan MATLAB Simulink dibangunkan untuk menentukan program
Pemodulatan Lebar Denyut Sinus Unipolar (SPWM) Program ini kemudian
dibangunkan dalam Pemproses Isyarat Digital (DSP) TMS320f28335. Hasilnya
menunjukkan bahawa voltan keluaran penyongsang tiga fasa boleh dikendalikan
- …