6 research outputs found

    Machine Learning Techniques for the Development of a Stratego Bot

    Get PDF
    Stratego is a two-player, non-stochastic, imperfect-information strategy game in which players try to locate and capture the opponent\u27s flag. At the outset o f each game, players deploy their pieces in any arrangement they choose. Throughout play, each player knows the positions of the opponent’s pieces, but not the specific identities o f the opponent’s pieces. The game therefore involves deduction, bluffing, and a degree o f invention in addition to the sort o f planning familiar to perfect-information games like chess or backgammon. Developing a strong A.l. player presents three major challenges. Firstly, a Stratego program must maintain states o f belief about the opponent’s pieces as well as beliefs about the opponent’s beliefs. Beliefs must be updated according to in-game events. We propose to solve this using Bayesian probability theory and Bayesian networks. Secondly, any turn-based game-playing program must perform tree search as part o f its planning and move-making routine. Search in perfect-information games such as chess has been studied extensively and produced a wealth o f algorithms and heuristics to expedite the process. Stochastic and imperfect-information games, however, have received less general attention, though Schaeffer et al have made a significant effort to revisit this domain. Interestingly, the same family o f algorithms (Ballard’s Star-1 and Star-2) used in the stochastic perfect-information game of backgammon can be used in the deterministic, imperfect-information domain o f Stratego. The technical challenge here, just as in the stochastic domain, is to optimize node cutoffs. Thirdly, a strong Stratego program should have some degree o f inventiveness so that it can avoid predictable play. The game’s intricacy comes from information being concealed from the players. A program which plays too predictably (that is, according to known or obvious tactics) has a significant disadvantage against a more creative opponent. There is a balance, however, between tactics’ being novel and being foolish. Current, strong Stratego programs have been developed by human experts (such as Vincent deBoer), whose tactical preferences are hard-coded into those programs. Since we claim no especial talent for Stratego ourselves, part o f the development challenge will be to allow the program to discover tactical preferences and advantages on its own. Withholding explicitly programmed heuristics and allowing machines to discover tactics on their own has led to original and powerful computer play in the past (note Tesauro’s success with TD-Gammon). We hope our program will likewise learn to play competitively without depending on instruction from a mediocre or predictable player. Various techniques from machine learning, including both supervised and unsupervised learning, are applied to this objective. At our disposal are more than 50,000 match records from an online Stratego site. Part of developing a strong player will involve separating the truly advantageous features in these data from features which are merely frequent. The learning process must be objective enough to avoid bias and predictability, yet robust enough to exploit utility. We introduce a modeling method which allows partial instruction as guidelines for feature detection

    Outperforming Game Theoretic Play with Opponent Modeling in Two Player Dominoes

    Get PDF
    Dominoes is a partially observable extensive form game with probability. The rules are simple; however, complexity and uncertainty of this game make it difficult to apply standard game theoretic methods to solve. This thesis applies strategy prediction opponent modeling to work with game theoretic search algorithms in the game of two player dominoes. This research also applies methods to compute the upper bound potential that predicting a strategy can provide towards specific strategy types. Furthermore, the actual values are computed according to the accuracy of a trained classifier. Empirical results show that there is a potential value gain over a Nash equilibrium player in score for fully and partially observable environments for specific strategy types. The actual value gained is positive for a fully observable environment for score and total wins and ties. Actual value gained over the Nash equilibrium player from the opponent model only exist for score, while the opponent modeler demonstrates a higher potential to win and/or tie in comparison to a pure game theoretic agent

    Approximate universal artificial intelligence and self-play learning for games

    Full text link
    This thesis is split into two independent parts. The first is an investigation of some practical aspects of Marcus Hutter's Universal Artificial Intelligence theory. The main contributions are to show how a very general agent can be built and analysed using the mathematical tools of this theory. Before the work presented in this thesis, it was an open question as to whether this theory was of any relevance to reinforcement learning practitioners. This work suggests that it is indeed relevant and worthy of future investigation. The second part of this thesis looks at self-play learning in two player, deterministic, adversarial turn-based games. The main contribution is the introduction of a new technique for training the weights of a heuristic evaluation function from data collected by classical game tree search algorithms. This method is shown to outperform previous self-play training routines based on Temporal Difference learning when applied to the game of Chess. In particular, the main highlight was using this technique to construct a Chess program that learnt to play master level Chess by tuning a set of initially random weights from self play games

    Generalized asset integrity games

    Get PDF
    Generalized assets represent a class of multi-scale adaptive state-transition systems with domain-oblivious performance criteria. The governance of such assets must proceed without exact specifications, objectives, or constraints. Decision making must rapidly scale in the presence of uncertainty, complexity, and intelligent adversaries. This thesis formulates an architecture for generalized asset planning. Assets are modelled as dynamical graph structures which admit topological performance indicators, such as dependability, resilience, and efficiency. These metrics are used to construct robust model configurations. A normalized compression distance (NCD) is computed between a given active/live asset model and a reference configuration to produce an integrity score. The utility derived from the asset is monotonically proportional to this integrity score, which represents the proximity to ideal conditions. The present work considers the situation between an asset manager and an intelligent adversary, who act within a stochastic environment to control the integrity state of the asset. A generalized asset integrity game engine (GAIGE) is developed, which implements anytime algorithms to solve a stochastically perturbed two-player zero-sum game. The resulting planning strategies seek to stabilize deviations from minimax trajectories of the integrity score. Results demonstrate the performance and scalability of the GAIGE. This approach represents a first-step towards domain-oblivious architectures for complex asset governance and anytime planning

    The Characterization of Chance and Skill in Games

    Get PDF
    Die Arbeit widmet sich der Einordnung von Spielen zwischen den Kategorien "Glücksspiel" und "Strategiespiel". Zu diesem Zweck wird der Einfluss einzelner Spielzüge mathematisch beschrieben. Einige "Spielzüge" gehen auf den Ausgang von Zufallsereignissen (z.B. Würfelergebnisse) zurück. Die anderen Spielzüge lassen sich den verschiedenen Spielern zuordnen. Diese Zuordnung ermöglicht den Vergleich des Zufallseinflusses mit dem Einfluss der verschiedenen Spieler. Fußend auf diesem Vergleich werden in der Arbeit die Maßzahlen "chanciness" (Glücksbestimmtheit) und "controllability" (Steuerbarkeit) eingeführt. Die Einordnung eines Spiels mit Hilfe diese Maßzahlen hängt nicht nur von den Regeln des Spiels sonder auch von den teilnehmenden Spielern ab. Außerdem kann sich die Einordnung zwischen den Perspektiven der Teilnehmer unterscheiden. Die Verwendung zweier Maßzahlen ist notwendig um diese Komplexität abzubilden. Am Beispiel von computergesteuerten Spielern wird die praktische Berechnung von "chanciness" und "controllability" veranschaulicht

    The dark side of the board: advances in chess Kriegspiel

    Get PDF
    While imperfect information games are an excellent model of real-world problems and tasks, they are often difficult for computer programs to play at a high level of proficiency, especially if they involve major uncertainty and a very large state space. Kriegspiel, a variant of chess making it similar to a wargame, is a perfect example: while the game was studied for decades from a game-theoretical viewpoint, it was only very recently that the first practical algorithms for playing it began to appear. This thesis presents, documents and tests a multi-sided effort towards making a strong Kriegspiel player, using heuristic searching, retrograde analysis and Monte Carlo tree search algorithms to achieve increasingly higher levels of play. The resulting program is currently the strongest computer player in the world and plays at an above-average human level
    corecore