74 research outputs found

    A learning framework for zero-knowledge game playing agents

    Get PDF
    The subjects of perfect information games, machine learning and computational intelligence combine in an experiment that investigates a method to build the skill of a game-playing agent from zero game knowledge. The skill of a playing agent is determined by two aspects, the first is the quantity and quality of the knowledge it uses and the second aspect is its search capacity. This thesis introduces a novel representation language that combines symbols and numeric elements to capture game knowledge. Insofar search is concerned; an extension to an existing knowledge-based search method is developed. Empirical tests show an improvement over alpha-beta, especially in learning conditions where the knowledge may be weak. Current machine learning techniques as applied to game agents is reviewed. From these techniques a learning framework is established. The data-mining algorithm, ID3, and the computational intelligence technique, Particle Swarm Optimisation (PSO), form the key learning components of this framework. The classification trees produced by ID3 are subjected to new post-pruning processes specifically defined for the mentioned representation language. Different combinations of these pruning processes are tested and a dominant combination is chosen for use in the learning framework. As an extension to PSO, tournaments are introduced as a relative fitness function. A variety of alternative tournament methods are described and some experiments are conducted to evaluate these. The final design decisions are incorporated into the learning frame-work configuration, and learning experiments are conducted on Checkers and some variations of Checkers. These experiments show that learning has occurred, but also highlights the need for further development and experimentation. Some ideas in this regard conclude the thesis.Dissertation (MSc)--University of Pretoria, 2007.Computer ScienceMScUnrestricte

    Virtual player design using self-learning via competitive coevolutionary algorithms

    Get PDF
    The Google Artificial Intelligence (AI) Challenge is an international contest the objective of which is to program the AI in a two-player real time strategy (RTS) game. This AI is an autonomous computer program that governs the actions that one of the two players executes during the game according to the state of play. The entries are evaluated via a competition mechanism consisting of two-player rounds where each entry is tested against others. This paper describes the use of competitive coevolutionary (CC) algorithms for the automatic generation of winning game strategies in Planet Wars, the RTS game associated with the 2010 contest. Three different versions of a prime algorithm have been tested. Their common nexus is not only the use of a Hall-of-Fame (HoF) to keep note of the winners of past coevolutions but also the employment of an archive of experienced players, termed the hall-of-celebrities (HoC), that puts pressure on the optimization process and guides the search to increase the strength of the solutions; their differences come from the periodical updating of the HoF on the basis of quality and diversity metrics. The goal is to optimize the AI by means of a self-learning process guided by coevolutionary search and competitive evaluation. An empirical study on the performance of a number of variants of the proposed algorithms is described and a statistical analysis of the results is conducted. In addition to the attainment of competitive bots we also conclude that the incorporation of the HoC inside the primary algorithm helps to reduce the effects of cycling caused by the use of HoF in CC algorithms.This work is partially supported by Spanish MICINN under Project ANYSELF (TIN2011-28627-C04-01),3 by Junta de Andalucía under Project P10-TIC-6083 (DNEMESIS) and by Universidad de Málaga, Campus de Excelencia Internacional Andalucía Tech

    A Survey of Monte Carlo Tree Search Methods

    Get PDF
    Monte Carlo tree search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarize the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work

    USING COEVOLUTION IN COMPLEX DOMAINS

    Get PDF
    Genetic Algorithms is a computational model inspired by Darwin's theory of evolution. It has a broad range of applications from function optimization to solving robotic control problems. Coevolution is an extension of Genetic Algorithms in which more than one population is evolved at the same time. Coevolution can be done in two ways: cooperatively, in which populations jointly try to solve an evolutionary problem, or competitively. Coevolution has been shown to be useful in solving many problems, yet its application in complex domains still needs to be demonstrated.Robotic soccer is a complex domain that has a dynamic and noisy environment. Many Reinforcement Learning techniques have been applied to the robotic soccer domain, since it is a great test bed for many machine learning methods. However, the success of Reinforcement Learning methods has been limited due to the huge state space of the domain. Evolutionary Algorithms have also been used to tackle this domain; nevertheless, their application has been limited to a small subset of the domain, and no attempt has been shown to be successful in acting on solving the whole problem.This thesis will try to answer the question of whether coevolution can be applied successfully to complex domains. Three techniques are introduced to tackle the robotic soccer problem. First, an incremental learning algorithm is used to achieve a desirable performance of some soccer tasks. Second, a hierarchical coevolution paradigm is introduced to allow coevolution to scale up in solving the problem. Third, an orchestration mechanism is utilized to manage the learning processes

    Spatial-temporal reasoning applications of computational intelligence in the game of Go and computer networks

    Get PDF
    Spatial-temporal reasoning is the ability to reason with spatial images or information about space over time. In this dissertation, computational intelligence techniques are applied to computer Go and computer network applications. Among four experiments, the first three are related to the game of Go, and the last one concerns the routing problem in computer networks. The first experiment represents the first training of a modified cellular simultaneous recurrent network (CSRN) trained with cellular particle swarm optimization (PSO). Another contribution is the development of a comprehensive theoretical study of a 2x2 Go research platform with a certified 5 dan Go expert. The proposed architecture successfully trains a 2x2 game tree. The contribution of the second experiment is the development of a computational intelligence algorithm calledcollective cooperative learning (CCL). CCL learns the group size of Go stones on a Go board with zero knowledge by communicating only with the immediate neighbors. An analysis determines the lower bound of a design parameter that guarantees a solution. The contribution of the third experiment is the proposal of a unified system architecture for a Go robot. A prototype Go robot is implemented for the first time in the literature. The last experiment tackles a disruption-tolerant routing problem for a network suffering from link disruption. This experiment represents the first time that the disruption-tolerant routing problem has been formulated with a Markov Decision Process. In addition, the packet delivery rate has been improved under a range of link disruption levels via a reinforcement learning approach --Abstract, page iv

    Investigating evolutionary checkers by incorporating individual and social learning, N-tuple systems and a round robin tournament

    Get PDF
    In recent years, much research attention has been paid to evolving self-learning game players. Fogel's Blondie24 is just one demonstration of a real success in this field and it has inspired many other scientists. In this thesis, artificial neural networks are employed to evolve game playing strategies for the game of checkers by introducing a league structure into the learning phase of a system based on Blondie24. We believe that this helps eliminate some of the randomness in the evolution. The best player obtained is tested against an evolutionary checkers program based on Blondie24. The results obtained are promising. In addition, we introduce an individual and social learning mechanism into the learning phase of the evolutionary checkers system. The best player obtained is tested against an implementation of an evolutionary checkers program, and also against a player, which utilises a round robin tournament. The results are promising. N-tuple systems are also investigated and are used as position value functions for the game of checkers. The architecture of the n-tuple is utilises temporal difference learning. The best player obtained is compared with an implementation of evolutionary checkers program based on Blondie24, and also against a Blondie24 inspired player, which utilises a round robin tournament. The results are promising. We also address the question of whether piece difference and the look-ahead depth are important factors in the Blondie24 architecture. Our experiments show that piece difference and the look-ahead depth have a significant effect on learning abilities

    Investigating evolutionary checkers by incorporating individual and social learning, N-tuple systems and a round robin tournament

    Get PDF
    In recent years, much research attention has been paid to evolving self-learning game players. Fogel's Blondie24 is just one demonstration of a real success in this field and it has inspired many other scientists. In this thesis, artificial neural networks are employed to evolve game playing strategies for the game of checkers by introducing a league structure into the learning phase of a system based on Blondie24. We believe that this helps eliminate some of the randomness in the evolution. The best player obtained is tested against an evolutionary checkers program based on Blondie24. The results obtained are promising. In addition, we introduce an individual and social learning mechanism into the learning phase of the evolutionary checkers system. The best player obtained is tested against an implementation of an evolutionary checkers program, and also against a player, which utilises a round robin tournament. The results are promising. N-tuple systems are also investigated and are used as position value functions for the game of checkers. The architecture of the n-tuple is utilises temporal difference learning. The best player obtained is compared with an implementation of evolutionary checkers program based on Blondie24, and also against a Blondie24 inspired player, which utilises a round robin tournament. The results are promising. We also address the question of whether piece difference and the look-ahead depth are important factors in the Blondie24 architecture. Our experiments show that piece difference and the look-ahead depth have a significant effect on learning abilities

    Efficient Evolution of Neural Networks

    Get PDF
    This thesis addresses the study of evolutionary methods for the synthesis of neural network controllers. Chapter 1 introduces the research area, reviews the state of the art, discusses promising research directions, and presents the two major scientific objectives of the thesis. The first objective, which is covered in Chapter 2, is to verify the efficacy of some of the most promising neuro-evolutionary methods proposed in the literature, including two new methods that I elaborated. This has been made by designing extended version of the double-pole balancing problem, which can be used to more properly benchmark alternative algorithms, by studying the effect of critical parameters, and by conducting several series of comparative experiments. The obtained results indicate that some methods perform better with respect to all the considered criteria, i.e. performance, robustness to environmental variations and capability to scale-up to more complex problems. The second objective, which is targeted in Chapter 3, consists in the design of a new hybrid algorithm that combines evolution and learning by demonstration. The combination of these two processes is appealing since it potentially allows the adaptive agent to exploit a richer training feedback constituted by both a scalar performance objective (reinforcement signal or fitness measure) and a detailed description of a suitable behaviour (demonstration). The proposed method has been successfully evaluated on two qualitatively different robotic problems. Chapter 4 summarizes the results obtained and describes the major contributions of the thesis

    Searching by learning: Exploring artificial general intelligence on small board games by deep reinforcement learning

    Get PDF
    In deep reinforcement learning, searching and learning techniques are two important components. They can be used independently and in combination to deal with different problems in AI. These results have inspired research into artificial general intelligence (AGI).We study table based classic Q-learning on the General Game Playing (GGP) system, showing that classic Q-learning works on GGP, although convergence is slow, and it is computationally expensive to learn complex games.This dissertation uses an AlphaZero-like self-play framework to explore AGI on small games. By tuning different hyper-parameters, the role, effects and contributions of searching and learning are studied. A further experiment shows that search techniques can contribute as experts to generate better training examples to speed up the start phase of training.In order to extend the AlphaZero-likeself-play approach to single player complex games, the Morpion Solitaire game is implemented by combining Ranked Reward method. Our first AlphaZero-based approach is able to achieve a near human best record.Overall, in this thesis, both searching and learning techniques are studied (by themselves and in combination) in GGP and AlphaZero-like self-play systems. We do so for the purpose of making steps towards artificial general intelligence, towards systems that exhibit intelligent behavior in more than one domain. China Scholarship CouncilAlgorithms and the Foundations of Software technolog
    • …
    corecore