10 research outputs found

    Learning Policies from Self-Play with Policy Gradients and MCTS Value Estimates

    Get PDF
    In recent years, state-of-the-art game-playing agents often involve policies that are trained in self-playing processes where Monte Carlo tree search (MCTS) algorithms and trained policies iteratively improve each other. The strongest results have been obtained when policies are trained to mimic the search behaviour of MCTS by minimising a cross-entropy loss. Because MCTS, by design, includes an element of exploration, policies trained in this manner are also likely to exhibit a similar extent of exploration. In this paper, we are interested in learning policies for a project with future goals including the extraction of interpretable strategies, rather than state-of-the-art game-playing performance. For these goals, we argue that such an extent of exploration is undesirable, and we propose a novel objective function for training policies that are not exploratory. We derive a policy gradient expression for maximising this objective function, which can be estimated using MCTS value estimates, rather than MCTS visit counts. We empirically evaluate various properties of resulting policies, in a variety of board games.Comment: Accepted at the IEEE Conference on Games (CoG) 201

    Variance Reduction in Population-Based Optimization: Application to Unit Commitment

    Get PDF
    forthcomingInternational audienceWe consider noisy optimization and some traditional variance reduction techniques aimed at improving the convergence rate, namely (i) common random numbers (CRN), which is relevant for population-based noisy optimization and (ii) stratified sampling, which is relevant for most noisy optimization problems. We present artificial models of noise for which common random numbers are very efficient, and artificial models of noise for which common random numbers are detrimental. We then experiment on a desperately expensive unit commitment problem. As expected, stratified sampling is never detrimental. Nonetheless, in practice, common random numbers provided, by far, most of the improvement

    A Survey of Monte Carlo Tree Search Methods

    Get PDF
    Monte Carlo tree search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarize the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work

    The effect of simulation bias on action selection in Monte Carlo Tree Search

    Get PDF
    A dissertation submitted to the Faculty of Science, University of the Witwatersrand, in fulfilment of the requirements for the degree of Master of Science. August 2016.Monte Carlo Tree Search (MCTS) is a family of directed search algorithms that has gained widespread attention in recent years. It combines a traditional tree-search approach with Monte Carlo simulations, using the outcome of these simulations (also known as playouts or rollouts) to evaluate states in a look-ahead tree. That MCTS does not require an evaluation function makes it particularly well-suited to the game of Go — seen by many to be chess’s successor as a grand challenge of artificial intelligence — with MCTS-based agents recently able to achieve expert-level play on 19×19 boards. Furthermore, its domain-independent nature also makes it a focus in a variety of other fields, such as Bayesian reinforcement learning and general game-playing. Despite the vast amount of research into MCTS, the dynamics of the algorithm are still not yet fully understood. In particular, the effect of using knowledge-heavy or biased simulations in MCTS still remains unknown, with interesting results indicating that better-informed rollouts do not necessarily result in stronger agents. This research provides support for the notion that MCTS is well-suited to a class of domain possessing a smoothness property. In these domains, biased rollouts are more likely to produce strong agents. Conversely, any error due to incorrect bias is compounded in non-smooth domains, and in particular for low-variance simulations. This is demonstrated empirically in a number of single-agent domains.LG201

    Balancing and Analyzing Player Interaction in the ESG+P Game with Machinations

    Get PDF
    This work explores the balancing of an educational game to teach sustainable development in organizations by focusing on player interaction and employing strategies. Game success is a challenge that relies on balancing the relationships among its elements. Balancing is a complex process performed over multiple iterations, starting at game conception and continuing throughout development and testing stages. This work extends our previous case study, which did not consider player interaction for the game balancing. We built two models that contains all game mechanics using the Machinations framework. The first model includes elements that randomly produce, distribute, and consume resources, while the second model analyzes player interaction and implements four player strategies. We simulated these models in batch plays, analyzed game states, and adjusted game economies. The random model simulation achieved a victory rate of 40%, while the interactive model simulation with player strategies increased victory rates to values between 66% and 81%. These results show that player interaction and decision-making can be more decisive than randomness in achieving victory. Machinations contributed to enhancing the game, proved its usefulness for simulating complex models, and deepened our understanding of game dynamics, including player actions, potential deadlocks, and feedback mechanisms. This work supports other authors’ findings by demonstrating that balancing the game as early as possible in the development process, considering player interaction, makes the design feasible; and provides evidence that computer simulations, such as Machinations, benefit the game balance and improve the game design without the need to build a prototype and conduct extensive playtests

    囲碁に対する2つの情報工学的アプローチ

    Get PDF
     本研究は「囲碁における情報工学的アプローチ」をテーマとしている.現在も囲碁で用いられているモンテカルロ法に関連した機械学習法,Simulation Adjusting の提案(第I 部) と,畳み込みニューラルネットワークを用いた棋力推定の提案(第II 部) が二大テーマである. 囲碁AI(Arti_cial Intelligence,人工知能)の研究は1960 年代に始まったが,その棋力は,2005 年頃まで平均的なアマチュアプレイヤーにすら及ばなかった.そのような状況で,囲碁におけるモンテカルロ法の有効性が2006 年頃から知られ始め,囲碁AI の棋力は急速に伸び,2012 年頃にはアマチュア上級レベルに達した.しかし,2012 年頃から2015 年頃まで,囲碁AI の棋力は再び停滞した. その状況を打破しようと,モンテカルロ法のシミュレーション部の新たな学習手法として提案した手法が,第I 部で述べるSimulation Adjusting である.これは,モンテカルロ法を用いたAI が返す手が,人間の上級者の手と一致するように,シミュレーションの方策を調整していく手法である.既存研究にSilver らによるSimulation Balancing があるが,Simulation Balancing が「教師となるモンテカルロAI」から学習するのに対し,Simulation Adjusting は「上級者の棋譜や問題集」から学習する点が異なる. Simulation Adjusting の実験では,最終的には5 路盤の問題集での安定した目的関数の減少と正答率の向上を確認できた. 一方,2012 年にDCNN(Deep Convolutional Neural Network,階層の深い畳み込みニューラルネットワーク)を用いたシステムが画像認識のコンペティションでブレイクスルーを起こした.その後,囲碁の盤面認識にもDCNN が有効なのではないかと言われ始め, 2014 年頃から徐々に,DCNN が囲碁の着手予測にも有効であるという研究が発表され始めた.2016 年にGoogle 社のDeepMind チームが発表したAlphaGo は人間のトッププレイヤーに初めて勝利を収めたが,これには,DCNN が決定的な役割を果たしている. 第II 部では,CNN(Convolutional Neural Network,畳み込みニューラルネットワーク)の局面認識能力を活用し,一局の棋譜のみからの棋力推定に関する研究を述べる. プレイヤーの棋力を少数の棋譜からコンピュータが自動的に推定できるようになれば,インターネット碁会所の運営などにたいへん役に立つと考えられる.また,実際に人間の上級者は一局の棋譜のみからプレイヤーの棋力をかなりの精度で推測できると言われており,この研究は人工知能の観点からも興味深い. コンピュータによる棋力推定に関する既存研究は囲碁を含めていくつか存在したが, CNN を使用したものは当時は見当たらなかった.我々の研究ではDCNN のライブラリであるCaffe を用いた棋力推定のシステムを構築し,インターネット対局場「囲碁クエスト」の13 路盤の棋譜を用いて,どれくらいの精度で棋力を推定可能かを調べる実験を行った.その結果,適合誤差を既存研究より小さくすることができた.この結果より,畳み込みニューラルネットワークは,AI の棋力向上だけでなく棋譜からの棋力推定にも有用であると分かった.電気通信大学201

    Monte-Carlo tree search using expert knowledge: an application to computer go and human genetics

    Get PDF
    Monte-Carlo Tree Search (MCTS la búsqueda en árbol mediante procesos estocásticos) se ha convertido en el algorítmo principal en muchos problemas de inteligencia artificial e informática. Esta tesis analiza la incorporación de conocimiento experto para mejorar la búsqueda. El trabajo describe dos aplicaciones: una en el 'juego del go' por el ordenador y otra en el campo de la genética humana. Es un hecho establecido que, en problemas complejos, MCTS requiere el apoyo de conocimiento específico o aprendido online para mejorar su rendimiento. Lo que este trabajo analiza son diferentes ideas de cómo hacerlo, sus resultados e implicaciones, mejorando así nuestra comprensión de MCTS. Las principales contribuciones al área son: un modelo analítico de las simulaciones que mejora la comprensión del papel de las simulaciones, un marco competitivo incluyendo código y datos para comparar métodos en etiología genética y tres aplicaciones con éxito: una en el campo de las aperturas en go de 19x19 llamada M-eval, otra sobre simulaciones que aprenden y una en etiología genética. Además, merece la pena destacar: un modelo para representar proporciones mediante estados llamado WLS con software libre, un resultado negativo sobre una idea para las simulaciones, el descubrimiento inesperado de un posible problema utilizando MCTS en optimización y un análisis original de las limitaciones

    Monte-Carlo Simulation Balancing in Practice

    No full text
    [[abstract]]Simulation balancing is a new technique to tune parameters of a playout policy for a Monte-Carlo game-playing program. So far, this algorithm had only been tested in a very artificial setting: it was limited to 5 ? 5 and 6 ? 6 Go, and required a stronger external program that served as a supervisor. In this paper, the e?ectiveness of simulation balancing is demonstrated in a more realistic setting. A state-of-the-art program, Erica, learned an improved playout policy on the 9 ? 9 board, without requiring any external expert to provide position evaluations. Evaluations were collected by letting the program analyze positions by itself. The previous version of Erica learned pattern weights with the minorization-maximization algorithm. Thanks to simulation balancing, its playing strength was improved from a winning rate of 69% to 78% against Fuego 0.4.
    corecore