114 research outputs found

    PSO-based coevolutionary Game Learning

    Get PDF
    Games have been investigated as computationally complex problems since the inception of artificial intelligence in the 1950’s. Originally, search-based techniques were applied to create a competent (and sometimes even expert) game player. The search-based techniques, such as game trees, made use of human-defined knowledge to evaluate the current game state and recommend the best move to make next. Recent research has shown that neural networks can be evolved as game state evaluators, thereby removing the human intelligence factor completely. This study builds on the initial research that made use of evolutionary programming to evolve neural networks in the game learning domain. Particle Swarm Optimisation (PSO) is applied inside a coevolutionary training environment to evolve the weights of the neural network. The training technique is applied to both the zero sum and non-zero sum game domains, with specific application to Tic-Tac-Toe, Checkers and the Iterated Prisoners Dilemma (IPD). The influence of the various PSO parameters on playing performance are experimentally examined, and the overall performance of three different neighbourhood information sharing structures compared. A new coevolutionary scoring scheme and particle dispersement operator are defined, inspired by Formula One Grand Prix racing. Finally, the PSO is applied in three novel ways to evolve strategies for the IPD – the first application of its kind in the PSO field. The PSO-based coevolutionary learning technique described and examined in this study shows promise in evolving intelligent evaluators for the aforementioned games, and further study will be conducted to analyse its scalability to larger search spaces and games of varying complexity.Dissertation (MSc)--University of Pretoria, 2005.Computer Scienceunrestricte

    Fuzzy PSO: A Generalization of Particle Swarm Optimization

    Get PDF
    In standard particle swarm optimization (PSO), the best particle in each neighborhood exerts its influence over other particles in the neighborhood. In this paper, we propose fuzzy PSO, a generalization which differs from standard PSO in the following respect: charisma is defined to be a fuzzy variable, and more than one particle in each neighborhood can have a non-zero degree of charisma, and, consequently, is allowed to influence others to a degree that depends on its charisma. We evaluate our model on the weighted maximum satisfiability (maxsat) problem, comparing performance to standard PSO and to Walk-Sat

    A learning framework for zero-knowledge game playing agents

    Get PDF
    The subjects of perfect information games, machine learning and computational intelligence combine in an experiment that investigates a method to build the skill of a game-playing agent from zero game knowledge. The skill of a playing agent is determined by two aspects, the first is the quantity and quality of the knowledge it uses and the second aspect is its search capacity. This thesis introduces a novel representation language that combines symbols and numeric elements to capture game knowledge. Insofar search is concerned; an extension to an existing knowledge-based search method is developed. Empirical tests show an improvement over alpha-beta, especially in learning conditions where the knowledge may be weak. Current machine learning techniques as applied to game agents is reviewed. From these techniques a learning framework is established. The data-mining algorithm, ID3, and the computational intelligence technique, Particle Swarm Optimisation (PSO), form the key learning components of this framework. The classification trees produced by ID3 are subjected to new post-pruning processes specifically defined for the mentioned representation language. Different combinations of these pruning processes are tested and a dominant combination is chosen for use in the learning framework. As an extension to PSO, tournaments are introduced as a relative fitness function. A variety of alternative tournament methods are described and some experiments are conducted to evaluate these. The final design decisions are incorporated into the learning frame-work configuration, and learning experiments are conducted on Checkers and some variations of Checkers. These experiments show that learning has occurred, but also highlights the need for further development and experimentation. Some ideas in this regard conclude the thesis.Dissertation (MSc)--University of Pretoria, 2007.Computer ScienceMScUnrestricte

    Using particle swarm optimization to evolve two-player game agents

    Get PDF
    Computer game-playing agents are almost as old as computers themselves, and people have been developing agents since the 1950's. Unfortunately the techniques for game-playing agents have remained basically the same for almost half a century -- an eternity in computer time. Recently developed approaches have shown that it is possible to develop game playing agents with the help of learning algorithms. This study is based on the concept of algorithms that learn how to play board games from zero initial knowledge about playing strategies. A coevolutionary approach, where a neural network is used to assess desirability of leaf nodes in a game tree, and evolutionary algorithms are used to train neural networks in competition, is overviewed. This thesis then presents an alternative approach in which particle swarm optimization (PSO) is used to train the neural networks. Different variations of the PSO are implemented and compared. The results of the PSO approaches are also compared with that of an evolutionary programming approach. The performance of the PSO algorithms is investigated for different values of the PSO control parameters. This study shows that the PSO approach can be applied successfully to train game-playing agents.Dissertation (MSc)--University of Pretoria, 2007.Computer ScienceUnrestricte

    Automatic Generation of Evaluation Features for Computer Game Players

    Full text link

    Co-evolutionary and Reinforcement Learning Techniques Applied to Computer Go players

    Get PDF
    The objective of this thesis is model some processes from the nature as evolution and co-evolution, and proposing some techniques that can ensure that these learning process really happens and useful to solve some complex problems as Go game. The Go game is ancient and very complex game with simple rules which still is a challenge for the Artificial Intelligence. This dissertation cover some approaches that were applied to solve this problem, proposing solve this problem using competitive and cooperative co-evolutionary learning methods and other techniques proposed by the author. To study, implement and prove these methods were used some neural networks structures, a framework free available and coded many programs. The techniques proposed were coded by the author, performed many experiments to find the best configuration to ensure that co-evolution is progressing and discussed the results. Using co-evolutionary learning processes can be observed some pathologies which could impact co-evolution progress. In this dissertation is introduced some techniques to solve pathologies as loss of gradients, cycling dynamics and forgetting. According to some authors, one solution to solve these co-evolution pathologies is introduce more diversity in populations that are evolving. In this thesis is proposed some techniques to introduce more diversity and some diversity measurements for neural networks structures to monitor diversity during co-evolution. The genotype diversity evolved were analyzed in terms of its impact to global fitness of the strategies evolved and their generalization. Additionally, it was introduced a memory mechanism in the network neural structures to reinforce some strategies in the genes of the neurons evolved with the intention that some good strategies learned are not forgotten. In this dissertation is presented some works from other authors in which cooperative and competitive co-evolution has been applied. The Go board size used in this thesis was 9x9, but can be easily escalated to more bigger boards.The author believe that programs coded and techniques introduced in this dissertation can be used for other domains

    A hybridisation technique for game playing using the upper confidence for trees algorithm with artificial neural networks

    Get PDF
    In the domain of strategic game playing, the use of statistical techniques such as the Upper Confidence for Trees (UCT) algorithm, has become the norm as they offer many benefits over classical algorithms. These benefits include requiring no game-specific strategic knowledge and time-scalable performance. UCT does not incorporate any strategic information specific to the game considered, but instead uses repeated sampling to effectively brute-force search through the game tree or search space. The lack of game-specific knowledge in UCT is thus both a benefit but also a strategic disadvantage. Pattern recognition techniques, specifically Neural Networks (NN), were identified as a means of addressing the lack of game-specific knowledge in UCT. Through a novel hybridisation technique which combines UCT and trained NNs for pruning, the UCTNN algorithm was derived. The NN component of UCT-NN was trained using a UCT self-play scheme to generate game-specific knowledge without the need to construct and manage game databases for training purposes. The UCT-NN algorithm is outlined for pruning in the game of Go-Moku as a candidate case-study for this research. The UCT-NN algorithm contained three major parameters which emerged from the UCT algorithm, the use of NNs and the pruning schemes considered. Suitable methods for finding candidate values for these three parameters were outlined and applied to the game of Go-Moku on a 5 by 5 board. An empirical investigation of the playing performance of UCT-NN was conducted in comparison to UCT through three benchmarks. The benchmarks comprise a common randomly moving opponent, a common UCTmax player which is given a large amount of playing time, and a pair-wise tournament between UCT-NN and UCT. The results of the performance evaluation for 5 by 5 Go-Moku were promising, which prompted an evaluation of a larger 9 by 9 Go-Moku board. The results of both evaluations indicate that the time allocated to the UCT-NN algorithm directly affects its performance when compared to UCT. The UCT-NN algorithm generally performs better than UCT in games with very limited time-constraints in all benchmarks considered except when playing against a randomly moving player in 9 by 9 Go-Moku. In real-time and near-real-time Go-Moku games, UCT-NN provides statistically significant improvements compared to UCT. The findings of this research contribute to the realisation of applying game-specific knowledge to the UCT algorithm

    Advanced Methods for Photovoltaic Output Power Forecasting: A Review

    Get PDF
    Forecasting is a crucial task for successfully integrating photovoltaic (PV) output power into the grid. The design of accurate photovoltaic output forecasters remains a challenging issue, particularly for multistep-ahead prediction. Accurate PV output power forecasting is critical in a number of applications, such as micro-grids (MGs), energy optimization and management, PV integrated in smart buildings, and electrical vehicle chartering. Over the last decade, a vast literature has been produced on this topic, investigating numerical and probabilistic methods, physical models, and artificial intelligence (AI) techniques. This paper aims at providing a complete and critical review on the recent applications of AI techniques; we will focus particularly on machine learning (ML), deep learning (DL), and hybrid methods, as these branches of AI are becoming increasingly attractive. Special attention will be paid to the recent development of the application of DL, as well as to the future trends in this topic

    Incorporating prior knowledge into deep neural network controllers of legged robots

    Get PDF

    Incorporating Memory and Learning Mechanisms Into Meta-RaPS

    Get PDF
    Due to the rapid increase of dimensions and complexity of real life problems, it has become more difficult to find optimal solutions using only exact mathematical methods. The need to find near-optimal solutions in an acceptable amount of time is a challenge when developing more sophisticated approaches. A proper answer to this challenge can be through the implementation of metaheuristic approaches. However, a more powerful answer might be reached by incorporating intelligence into metaheuristics. Meta-RaPS (Metaheuristic for Randomized Priority Search) is a metaheuristic that creates high quality solutions for discrete optimization problems. It is proposed that incorporating memory and learning mechanisms into Meta-RaPS, which is currently classified as a memoryless metaheuristic, can help the algorithm produce higher quality results. The proposed Meta-RaPS versions were created by taking different perspectives of learning. The first approach taken is Estimation of Distribution Algorithms (EDA), a stochastic learning technique that creates a probability distribution for each decision variable to generate new solutions. The second Meta-RaPS version was developed by utilizing a machine learning algorithm, Q Learning, which has been successfully applied to optimization problems whose output is a sequence of actions. In the third Meta-RaPS version, Path Relinking (PR) was implemented as a post-optimization method in which the new algorithm learns the good attributes by memorizing best solutions, and follows them to reach better solutions. The fourth proposed version of Meta-RaPS presented another form of learning with its ability to adaptively tune parameters. The efficiency of these approaches motivated us to redesign Meta-RaPS by removing the improvement phase and adding a more sophisticated Path Relinking method. The new Meta-RaPS could solve even the largest problems in much less time while keeping up the quality of its solutions. To evaluate their performance, all introduced versions were tested using the 0-1 Multidimensional Knapsack Problem (MKP). After comparing the proposed algorithms, Meta-RaPS PR and Meta-RaPS Q Learning appeared to be the algorithms with the best and worst performance, respectively. On the other hand, they could all show superior performance than other approaches to the 0-1 MKP in the literature
    • …
    corecore