12 research outputs found

    Generalized Monte-Carlo Tree Search Extensions for General Game Playing

    No full text
    General Game Playing (GGP) agents must be capable of playing a wide variety of games skillfully. Monte-Carlo Tree Search (MCTS) has proven an effective reasoning mechanism for this challenge, as is reflected by its popularity among designers of GGP agents. Providing GGP agents with the knowledge relevant to the game at hand in real time is, however, a challenging task. In this paper we propose two enhancements for MCTS in the context of GGP, aimed at improving the effectiveness of the simulations in real time based on in-game statistical feedback. The first extension allows early termination of lengthy and uninformative simulations while the second improves the action-selection strategy when both explored and unexplored actions are available. The methods are empirically evaluated in a state-of-the-art GGP agent and shown to yield an overall significant improvement in playing strength

    CADIA-Player: alhliða leikjaspilari

    No full text
    The aim of General Game Playing (GGP) is to create intelligent agents that can automatically learn how to play many different games well without any human intervention, given only a description of the game rules. This forces the agents to be able to learn a strategy without having any domain-specific knowledge provided by their developers. The most successful GGP agents have so far been based on the traditional approach of using game-tree search augmented with an automatically learned evaluation function for encapsulating the domain-specific knowledge. In this thesis we describe CADIAPlayer, a GGP agent that instead uses a simulation-based approach to reason about its actions. More specifically, it uses Monte Carlo rollouts with upper confidence bounds for trees (UCT) as its main search procedure. CADIAPlayer has already proven the effectiveness of this simulation-based approach in the context of GGP by winning the Third Annual GGP Competition. We describe its implementation as well as several algorithmic improvements for making the simulations more effective. Empirical data is presented showing that CADIA-Player outperforms naïve Monte Carlo by close to 90% winning ratio on average on a wide range of games, including Checkers and Othello. We further investigate the relative importance of UCT’s actionselection rule, its memory model, and the various enhancements in achieving this result.Markmið Alhliða Leikjaspilunar (e. General Game Playing) er að búa til greind forrit sem ekki eru einskorðuð við einn leik, heldur fá sem inntak leikreglur og þurfa að geta lært að spila leikinn sem þær lýsa. Þetta neyðir forritið til að mynda herkænsku sína á eigin spýtur án þess að styðjast við upplýsingar um leikinn sem hönnuður þess hefur sett inn í það. Hingað til hafa þau forrit sem notið hafa mestrar velgengni í alhliða leikjaspilun notað hina hefðbundnu aðferð að leita í leiktrénu með sjálfvirkri uppgötvun gildisákvörðunarfalls til að hjúpa þekkingu út frá lýsingu leiksins. Í þessari ritgerð lýsum við CADIA-Player, alhliða leikjaspilara sem notar hermanir til að draga ályktanir um leiki. Nánar tiltekið notar hann Monte Carlo útspilun með UCT (Upper Confidence Bound fyrir tré) sem aðal leitaraðferð sína. CADIA-Player hefur þegar, með því að vinna þriðju árlegu keppni slíkra forrita sannað hversu áhrifaríkar aðferðir byggðar á hermun geta verið í alhliða leikjaspilun. Við lýsum útfærslu spilarans auk þess að sýna nokkrar betrumbætur á algríminu sem auka afköst þess. Niðurstöður tilrauna eru gefnar sem sýna að CADIA-Player hefur mikla yfirburði yfir einfaldan Monte Carlo spilara, eða rétt undir 90% vinningshlutfall að meðaltali í hinum ýmsu leikjum, þ.m.t. Checkers og Othello. Við rannsökum enn fremur tölfræðilegt mikilvægi þess hvernig UCT velur aðgerðir, minnislíkans hans og hinna ýmsu viðbóta við að ná þessum árangri

    Alhliða Leikjaspilun byggð á Hermunaraðferðum

    No full text
    The aim of General Game Playing (GGP) is to create intelligent agents that automatically learn how to play many different games at an expert level without any human intervention. One of the main challenges such agents face is to automatically learn knowledge-based heuristics in real-time, whether for evaluating game positions or for search guidance. In this thesis we approach this challenge with Monte-Carlo Tree Search (MCTS), which in recent years has become a popular and effective search method in games. For competitive play such an approach requires an effective search-control mechanism for guiding the simulation playouts. In here we describe our GGP agent, CADIAPLAYER, and introduce several schemes for automatically learning search guidance based on both statistical and reinforcement learning techniques. Providing GGP agents with the knowledge relevant to the game at hand in real time is, however, a challenging task. This thesis furthermore proposes two extensions for MCTS in the context of GGP, aimed at improving the effectiveness of the simulations in real time based on in-game statistical feedback. Also we present a way to extend MCTS solvers to handle simultaneous move games. Finally, we study how various game-tree properties affect MCTS performance.Markmið Alhliða Leikjaspilunar (General Game Playing, GGP) er að búa til forrit sem geta á sjálfstæðan hátt lært að spila marga mismunandi leiki og náð getu sérfræðings án þess að mannshöndin komi þar nærri. Ein aðal áskorunin við gerð slíkra forrita að geta á sjálfvirkann hátt búið til þekkingu í rauntíma sem hjálpar til við að meta stöður og leiðbeina leitarrekniritum. Hér er reynt að nálgast þetta vandmál með leitaraðferð byggðri á hermunum sem nefnist Monte-Carlo Trjá Leit (Monte-Carlo Tree Search, MCTS). MCTS hefur á undanförnum árum náð miklum vinsældum vegna góðs árangur í spilun margs konar leikja. Til að vera keppnishæft þarf forrit sem beitir MCTS að hafa yfir að ráða öflugum aðferðum til að stýra útspilun hermana sinna. Það að sjá GGP forriti fyrir þekkingu er hentar þeim leik sem verið er að spila hverju sinni er mikil áskorun. Við kynnum forrit okkar í Alhliða Leikjaspilun, CADIAPLAYER, og ýmsar aðferðir sem við höfum þróað fyrir það sem læra að stýra leit sjálfvirkt með notkun tölfræði og styrkingarnáms. Að auki eru kynntar tvær nýjar viðbætur við MCTS í GGP sem nýta tölfræðilega endurgjöf við spilun leikja á árangursríkan hátt. Einnig er sýnd viðbót við MCTS sem gerir kleift að leysa leiki þar sem spilarar leika á sama tíma. Að lokum er kannað hvernig nokkur eigindi leikjatrjáa hafa áhrif á getu MCTS til að spila leiki

    Learning Simulation Control in General Game-Playing Agents

    No full text
    The aim of General Game Playing (GGP) is to create intelligent agents that can automatically learn how to play many different games at an expert level without any human intervention. One of the main challenges such agents face is to automatically learn knowledge-based heuristics in realtime, whether for evaluating game positions or for search guidance. In recent years, GGP agents that use Monte-Carlo simulations to reason about their actions have become increasingly more popular. For competitive play such an approach requires an effective search-control mechanism for guiding the simulation playouts. In here we introduce several schemes for automatically learning search guidance, as well as comparing them empirically. We show that by combining schemes one can improve upon the current state-of-the-art of simulation-based search-control in GGP.

    Alpha-beta pruning for games with simultaneous moves

    No full text
    Alpha-Beta pruning is one of the most powerful and fundamental MiniMax search improvements. It was designed for sequential two-player zero-sum perfect information games. In this paper we introduce an Alpha-Beta-like sound pruning method for the more general class of “stacked matrix games” that allow for simultaneous moves by both players. This is accomplished by maintaining upper and lower bounds for achievable payoffs in states with simultaneous actions and dominated action pruning based on the feasibility of certain linear programs. Empirical data shows considerable savings in terms of expanded nodes compared to naive depth-first move computation without pruning.
    corecore