4 research outputs found

    Dynamique d'apprentissage pour Monte Carlo Tree Search : applications aux jeux de Go et du Clobber solitaire impartial

    Get PDF
    Monte Carlo Tree Search (MCTS) has been initially introduced for the game of Go but has now been applied successfully to other games and opens the way to a range of new methods such as Multiple-MCTS or Nested Monte Carlo. MCTS evaluates game states through thousands of random simulations. As the simulations are carried out, the program guides the search towards the most promising moves. MCTS achieves impressive results by this dynamic, without an extensive need for prior knowledge. In this thesis, we choose to tackle MCTS as a full learning system. As a consequence, each random simulation turns into a simulated experience and its outcome corresponds to the resulting reinforcement observed. Following this perspective, the learning of the system results from the complex interaction of two processes : the incremental acquisition of new representations and their exploitation in the consecutive simulations. From this point of view, we propose two different approaches to enhance both processes. The first approach gathers complementary representations in order to enhance the relevance of the simulations. The second approach focuses the search on local sub-goals in order to improve the quality of the representations acquired. The methods presented in this work have been applied to the games of Go and Impartial Solitaire Clobber. The results obtained in our experiments highlight the significance of these processes in the learning dynamic and draw up new perspectives to enhance further learning systems such as MCTSDepuis son introduction pour le jeu de Go, Monte Carlo Tree Search (MCTS) a été appliqué avec succès à d'autres jeux et a ouvert la voie à une famille de nouvelles méthodes comme Mutilple-MCTS ou Nested Monte Carlo. MCTS évalue un ensemble de situations de jeu à partir de milliers de fins de parties générées aléatoirement. À mesure que les simulations sont produites, le programme oriente dynamiquement sa recherche vers les coups les plus prometteurs. En particulier, MCTS a suscité l'intérêt de la communauté car elle obtient de remarquables performances sans avoir pour autant recours à de nombreuses connaissances expertes a priori. Dans cette thèse, nous avons choisi d'aborder MCTS comme un système apprenant à part entière. Les simulations sont alors autant d'expériences vécues par le système et les résultats sont autant de renforcements. L'apprentissage du système résulte alors de la complexe interaction entre deux composantes : l'acquisition progressive de représentations et la mobilisation de celles-ci lors des futures simulations. Dans cette optique, nous proposons deux approches indépendantes agissant sur chacune de ces composantes. La première approche accumule des représentations complémentaires pour améliorer la vraisemblance des simulations. La deuxième approche concentre la recherche autour d'objectifs intermédiaires afin de renforcer la qualité des représentations acquises. Les méthodes proposées ont été appliquées aux jeu de Go et du Clobber solitaire impartial. La dynamique acquise par le système lors des expérimentations illustre la relation entre ces deux composantes-clés de l'apprentissag

    A Self-Acquiring Knowledge Process for MCTS

    No full text
    Les ressources du centre de calcul de l'IN2P3 ont été nécessaires pour obtenir certains des résultats présentés dans ce papier.International audienceMCTS (Monte Carlo Tree Search) is a well-known and efficient process to cover and evaluate a large range of states for combinatorial problems. We choose to study MCTS for the Computer Go problem, which is one of the most challenging problem in the field of Artificial Intelligence. For this game, a single combinatorial approach does not always lead to a reliable evaluation of the game states. In order to enhance MCTS ability to tackle such problems, one can benefit from game specific knowledge in order to increase the accuracy of the game state evaluation. Such a knowledge is not easy to acquire. It is the result of a constructivist learning mechanism based on the experience of the player. That is why we explore the idea to endow the MCTS with a process inspired by constructivist learning, to self-acquire knowledge from playing experience. In this paper, we propose a complementary process for MCTS called BHRF (Background History Reply Forest), which allows to memorize efficient patterns in order to promote their use through the MCTS process. Our experimental results lead to promising results and underline how self-acquired data can be useful for MCTS based algorithms

    A Self-Acquiring Knowledge Process for MCTS

    No full text
    Les ressources du centre de calcul de l'IN2P3 ont été nécessaires pour obtenir certains des résultats présentés dans ce papier.International audienceMCTS (Monte Carlo Tree Search) is a well-known and efficient process to cover and evaluate a large range of states for combinatorial problems. We choose to study MCTS for the Computer Go problem, which is one of the most challenging problem in the field of Artificial Intelligence. For this game, a single combinatorial approach does not always lead to a reliable evaluation of the game states. In order to enhance MCTS ability to tackle such problems, one can benefit from game specific knowledge in order to increase the accuracy of the game state evaluation. Such a knowledge is not easy to acquire. It is the result of a constructivist learning mechanism based on the experience of the player. That is why we explore the idea to endow the MCTS with a process inspired by constructivist learning, to self-acquire knowledge from playing experience. In this paper, we propose a complementary process for MCTS called BHRF (Background History Reply Forest), which allows to memorize efficient patterns in order to promote their use through the MCTS process. Our experimental results lead to promising results and underline how self-acquired data can be useful for MCTS based algorithms
    corecore