604 research outputs found

    Preference Learning for Move Prediction and Evaluation Function Approximation in Othello

    Get PDF
    This paper investigates the use of preference learning as an approach to move prediction and evaluation function approximation, using the game of Othello as a test domain. Using the same sets of features, we compare our approach with least squares temporal difference learning, direct classification, and with the Bradley-Terry model, fitted using minorization-maximization (MM). The results show that the exact way in which preference learning is applied is critical to achieving high performance. Best results were obtained using a combination of board inversion and pair-wise preference learning. This combination significantly outperformed the others under test, both in terms of move prediction accuracy, and in the level of play achieved when using the learned evaluation function as a move selector during game play

    A Survey of Monte Carlo Tree Search Methods

    Get PDF
    Monte Carlo tree search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarize the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work

    Learning Policies from Self-Play with Policy Gradients and MCTS Value Estimates

    Get PDF
    In recent years, state-of-the-art game-playing agents often involve policies that are trained in self-playing processes where Monte Carlo tree search (MCTS) algorithms and trained policies iteratively improve each other. The strongest results have been obtained when policies are trained to mimic the search behaviour of MCTS by minimising a cross-entropy loss. Because MCTS, by design, includes an element of exploration, policies trained in this manner are also likely to exhibit a similar extent of exploration. In this paper, we are interested in learning policies for a project with future goals including the extraction of interpretable strategies, rather than state-of-the-art game-playing performance. For these goals, we argue that such an extent of exploration is undesirable, and we propose a novel objective function for training policies that are not exploratory. We derive a policy gradient expression for maximising this objective function, which can be estimated using MCTS value estimates, rather than MCTS visit counts. We empirically evaluate various properties of resulting policies, in a variety of board games.Comment: Accepted at the IEEE Conference on Games (CoG) 201

    Temporal Difference Learning Versus Co-Evolution for Acquiring Othello Position Evaluation

    Full text link

    Menjana pemodulatan lebar denyut (PWM) penyongsang tiga fasa menggunakan pemproses isyarat digital (DSP)

    Get PDF
    Baru-baru ini, penyongsang digunakan secara meluas dalam aplikasi industri. Walaubagaimanapun, teknik Pemodulatan Lebar Denyut (PWM) diperlukan untuk mengawal voltan keluaran dan frekuensi penyongsang. Dalam tesis ini, untuk Pemodulatan Lebar Denyut Sinus Unipolar (SPWM) penyongsang tiga fasa adalah dicadang menggunakan Pemproses Isyarat Digital (DSP). Satu model simulasi menggunakan MATLAB Simulink dibangunkan untuk menentukan program Pemodulatan Lebar Denyut Sinus Unipolar (SPWM) Program ini kemudian dibangunkan dalam Pemproses Isyarat Digital (DSP) TMS320f28335. Hasilnya menunjukkan bahawa voltan keluaran penyongsang tiga fasa boleh dikendalikan
    • …
    corecore