53 research outputs found

    A Refinement-Based Heuristic Method for Decision Making in the Context of Ayo Game

    Get PDF
    Games of strategy, such as chess have served as a convenient test of skills at devising efficient search algorithms, formalizing knowledge, and bringing the power of computation to bear on “intractable” problems. Generally, minimax search has been the fundamental concept of obtaining solution to game problems. However, there are a number of limitations associated with using minimax search in order to offer solution to Ayo game. Among these limitations are: (i.) improper design of a suitable evaluator for moves before the moves are made, and (ii.) inability to select a correct move without assuming that players will play optimally. This study investigated the extent to which the knowledge of minimax search technique could be enhanced with a refinement-based heuristic method for playing Ayo game. This is complemented by the CDG (an end game strategy) for generating procedures such that only good moves are generated at any instance of playing Ayo game by taking cognizance of the opponent strategy of play. The study was motivated by the need to advance the African board game – Ayo – to see how it could be made to be played by humans across the globe, by creating both theoretical and product-oriented framework. This framework provides local Ayo game promotion initiatives in accordance with state-of-the-art practices in the global game playing domain. In order to accomplish this arduous task, both theoretical and empirical approaches were used. The theoretical approach reveals some mathematical properties of Ayo game with specific emphasis on the CDG as an end game strategy and means of obtaining the minimal and maximal CDG configurations. Similarly, a theoretical analysis of the minimax search was given and was enhanced with the Refinement-based heuristics. For the empirical approach, we simulated Ayo game playing on a digital viii computer and studied the behaviour of the various heuristic metrics used and compared the play strategies of the simulation with AWALE (the world known Ayo game playing standard software). Furthermore, empirical judgment was carried out on how experts play Ayo game as a means of evaluating the performance of the heuristics used to evolve the Ayo player in the simulation which gives room for statistical interpretation. This projects novel means of solving the problem of decision making in move selections in computer game playing of Ayo game. The study shows how an indigenous game like Ayo can generate integer sequence, and consequently obtain some self-replicating patterns that repeat themselves at different iterations. More importantly, the study gives an efficient and usable operation support tools in the prototype simulation of Ayo game playing that has improvement over Awal

    Advances in decision-theoretic AI : limited rationality and abstract search

    Get PDF
    Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1994.Includes bibliographical references (p. 153-165).by Michael Patrick Frank.M.S

    Expert iteration

    Get PDF
    In this thesis, we study how reinforcement learning algorithms can tackle classical board games without recourse to human knowledge. Specifically, we develop a framework and algorithms which learn to play the board game Hex starting from random play. We first describe Expert Iteration (ExIt), a novel reinforcement learning framework which extends Modified Policy Iteration. ExIt explicitly decomposes the reinforcement learning problem into two parts: planning and generalisation. A planning algorithm explores possible move sequences starting from a particular position to find good strategies from that position, while a parametric function approximator is trained to predict those plans, generalising to states not yet seen. Subsequently, planning is improved by using the approximated policy to guide search, increasing the strength of new plans. This decomposition allows ExIt to combine the benefits of both planning methods and function approximation methods. We demonstrate the effectiveness of the ExIt paradigm by implementing ExIt with two different planning algorithms. First, we develop a version based on Monte Carlo Tree Search (MCTS), a search algorithm which has been successful both in specific games, such as Go, Hex and Havannah, and in general game playing competitions. We then develop a new planning algorithm, Policy Gradient Search (PGS), which uses a model-free reinforcement learning algorithm for online planning. Unlike MCTS, PGS does not require an explicit search tree. Instead PGS uses function approximation within a single search, allowing it to be applied to problems with larger branching factors. Both MCTS-ExIt and PGS-ExIt defeated MoHex 2.0 - the most recent Hex Olympiad winner to be open sourced - in 9 × 9 Hex. More importantly, whereas MoHex makes use of many Hex-specific improvements and knowledge, all our programs were trained tabula rasa using general reinforcement learning methods. This bodes well for ExIt’s applicability to both other games and real world decision making problems

    On forward pruning in game-tree search

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Symbolic Search in Planning and General Game Playing

    Get PDF
    Search is an important topic in many areas of AI. Search problems often result in an immense number of states. This work addresses this by using a special datastructure, BDDs, which can represent large sets of states efficiently, often saving space compared to explicit representations. The first part is concerned with an analysis of the complexity of BDDs for some search problems, resulting in lower or upper bounds on BDD sizes for these. The second part is concerned with action planning, an area where the programmer does not know in advance what the search problem will look like. This part presents symbolic algorithms for finding optimal solutions for two different settings, classical and net-benefit planning, as well as several improvements to these algorithms. The resulting planner was able to win the International Planning Competition IPC 2008. The third part is concerned with general game playing, which is similar to planning in that the programmer does not know in advance what game will be played. This work proposes algorithms for instantiating the input and solving games symbolically. For playing, a hybrid player based on UCT and the solver is presented

    Generalized asset integrity games

    Get PDF
    Generalized assets represent a class of multi-scale adaptive state-transition systems with domain-oblivious performance criteria. The governance of such assets must proceed without exact specifications, objectives, or constraints. Decision making must rapidly scale in the presence of uncertainty, complexity, and intelligent adversaries. This thesis formulates an architecture for generalized asset planning. Assets are modelled as dynamical graph structures which admit topological performance indicators, such as dependability, resilience, and efficiency. These metrics are used to construct robust model configurations. A normalized compression distance (NCD) is computed between a given active/live asset model and a reference configuration to produce an integrity score. The utility derived from the asset is monotonically proportional to this integrity score, which represents the proximity to ideal conditions. The present work considers the situation between an asset manager and an intelligent adversary, who act within a stochastic environment to control the integrity state of the asset. A generalized asset integrity game engine (GAIGE) is developed, which implements anytime algorithms to solve a stochastically perturbed two-player zero-sum game. The resulting planning strategies seek to stabilize deviations from minimax trajectories of the integrity score. Results demonstrate the performance and scalability of the GAIGE. This approach represents a first-step towards domain-oblivious architectures for complex asset governance and anytime planning

    Learning search decisions

    Get PDF

    Application of temporal difference learning and supervised learning in the game of Go.

    Get PDF
    by Horace Wai-Kit, Chan.Thesis (M.Phil.)--Chinese University of Hong Kong, 1996.Includes bibliographical references (leaves 109-112).Acknowledgement --- p.iAbstract --- p.iiChapter 1 --- Introduction --- p.1Chapter 1.1 --- Overview --- p.1Chapter 1.2 --- Objective --- p.3Chapter 1.3 --- Organization of This Thesis --- p.3Chapter 2 --- Background --- p.5Chapter 2.1 --- Definitions --- p.5Chapter 2.1.1 --- Theoretical Definition of Solving a Game --- p.5Chapter 2.1.2 --- Definition of Computer Go --- p.7Chapter 2.2 --- State of the Art of Computer Go --- p.7Chapter 2.3 --- A Framework for Computer Go --- p.11Chapter 2.3.1 --- Evaluation Function --- p.11Chapter 2.3.2 --- Plausible Move Generator --- p.14Chapter 2.4 --- Problems Tackled in this Research --- p.14Chapter 3 --- Application of TD in Game Playing --- p.15Chapter 3.1 --- Introduction --- p.15Chapter 3.2 --- Reinforcement Learning and TD Learning --- p.15Chapter 3.2.1 --- Models of Learning --- p.16Chapter 3.2.2 --- Temporal Difference Learning --- p.16Chapter 3.3 --- TD Learning and Game-playing --- p.20Chapter 3.3.1 --- Game-Playing as a Delay-reward Prediction Problem --- p.20Chapter 3.3.2 --- Previous Work of TD Learning in Backgammon --- p.20Chapter 3.3.3 --- Previous Works of TD Learning in Go --- p.22Chapter 3.4 --- Design of this Research --- p.23Chapter 3.4.1 --- Limitations in the Previous Researches --- p.24Chapter 3.4.2 --- Motivation --- p.25Chapter 3.4.3 --- Objective and Methodology --- p.26Chapter 4 --- Deriving a New Updating Rule to Apply TD Learning in Multi-layer Perceptron --- p.28Chapter 4.1 --- Multi-layer Perceptron (MLP) --- p.28Chapter 4.2 --- Derivation of TD(A) Learning Rule for MLP --- p.31Chapter 4.2.1 --- Notations --- p.31Chapter 4.2.2 --- A New Generalized Delta Rule --- p.31Chapter 4.2.3 --- Updating rule for TD(A) Learning --- p.34Chapter 4.3 --- Algorithm of Training MLP using TD(A) --- p.35Chapter 4.3.1 --- Definitions of Variables in the Algorithm --- p.35Chapter 4.3.2 --- Training Algorithm --- p.36Chapter 4.3.3 --- Description of the Algorithm --- p.39Chapter 5 --- Experiments --- p.41Chapter 5.1 --- Introduction --- p.41Chapter 5.2 --- Experiment 1 : Training Evaluation Function for 7 x 7 Go Games by TD(λ) with Self-playing --- p.42Chapter 5.2.1 --- Introduction --- p.42Chapter 5.2.2 --- 7 x 7 Go --- p.42Chapter 5.2.3 --- Experimental Designs --- p.43Chapter 5.2.4 --- Performance Testing for Trained Networks --- p.44Chapter 5.2.5 --- Results --- p.44Chapter 5.2.6 --- Discussions --- p.45Chapter 5.2.7 --- Limitations --- p.47Chapter 5.3 --- Experiment 2 : Training Evaluation Function for 9 x 9 Go Games by TD(λ) Learning from Human Games --- p.47Chapter 5.3.1 --- Introduction --- p.47Chapter 5.3.2 --- 9x 9 Go game --- p.48Chapter 5.3.3 --- Training Data Preparation --- p.49Chapter 5.3.4 --- Experimental Designs --- p.50Chapter 5.3.5 --- Results --- p.52Chapter 5.3.6 --- Discussion --- p.54Chapter 5.3.7 --- Limitations --- p.56Chapter 5.4 --- Experiment 3 : Life Status Determination in the Go Endgame --- p.57Chapter 5.4.1 --- Introduction --- p.57Chapter 5.4.2 --- Training Data Preparation --- p.58Chapter 5.4.3 --- Experimental Designs --- p.60Chapter 5.4.4 --- Results --- p.64Chapter 5.4.5 --- Discussion --- p.65Chapter 5.4.6 --- Limitations --- p.66Chapter 5.5 --- A Postulated Model --- p.66Chapter 6 --- Conclusions --- p.69Chapter 6.1 --- Future Direction of Research --- p.71Chapter A --- An Introduction to Go --- p.72Chapter A.l --- A Brief Introduction --- p.72Chapter A.1.1 --- What is Go? --- p.72Chapter A.1.2 --- History of Go --- p.72Chapter A.1.3 --- Equipment used in a Go game --- p.73Chapter A.2 --- Basic Rules in Go --- p.74Chapter A.2.1 --- A Go game --- p.74Chapter A.2.2 --- Liberty and Capture --- p.75Chapter A.2.3 --- Ko --- p.77Chapter A.2.4 --- "Eyes, Live and Death" --- p.81Chapter A.2.5 --- Seki --- p.83Chapter A.2.6 --- Endgame and Scoring --- p.83Chapter A.2.7 --- Rank and Handicap Games --- p.85Chapter A.3 --- Strategies and Tactics in Go --- p.87Chapter A.3.1 --- Strategy vs Tactics --- p.87Chapter A.3.2 --- Open-game --- p.88Chapter A.3.3 --- Middle-game --- p.91Chapter A.3.4 --- End-game --- p.92Chapter B --- Mathematical Model of Connectivity --- p.94Chapter B.1 --- Introduction --- p.94Chapter B.2 --- Basic Definitions --- p.94Chapter B.3 --- Adjacency and Connectivity --- p.96Chapter B.4 --- String and Link --- p.98Chapter B.4.1 --- String --- p.98Chapter B.4.2 --- Link --- p.98Chapter B.5 --- Liberty and Atari --- p.99Chapter B.5.1 --- Liberty --- p.99Chapter B.5.2 --- Atari --- p.101Chapter B.6 --- Ko --- p.101Chapter B.7 --- Prohibited Move --- p.104Chapter B.8 --- Path and Distance --- p.105Bibliography --- p.10
    corecore