40,059 research outputs found

    The Layered Learning Method and its Application to Generation of Evaluation Functions for the Game

    Get PDF
    Abstract. In this paper we describe and analyze a Computational Intelligence (CI)-based approach to creating evaluation functions for two player mind games (i.e. classical turn-based board games that require mental skills, such as chess, checkers, Go, Othello, etc.). The method allows gradual, step-by-step training, starting with end-game positions and gradually moving towards the root of the game tree. In each phase a new training set is generated basing on results of previous training stages and any supervised learning method can be used for actual development of the evaluation function. We validate the usefulness of the approach by employing it to develop heuristics for the game of checkers. Since in previous experiments we applied it to training evaluation functions encoded as linear combinations of game state statistics, this time we concentrate on development of artificial neural network (ANN)-based heuristics. Games provide cheap, reproducible environments suitable for testing new search algorithms, pattern-based evaluation methods or learning concepts. Since the seminal papers devoted to programming chess [1-3] and checkers Most examples of application of CI methods to mind game playing make use of either reinforcement learning methods, neural networks-based approaches, evolutionary methods or hybrid neuro-genetic solutions, e.g. in chess The main focus of this paper is on testing the efficacy of what we call Layered Learning -a generally-applicable approach to building the evaluation function for twoplayer games (checkers in here) which can be implemented either in the evolutionary mode or as a gradient backpropagation-type neural network training. The method, originally proposed i

    Beyond Monte Carlo Tree Search: Playing Go with Deep Alternative Neural Network and Long-Term Evaluation

    Full text link
    Monte Carlo tree search (MCTS) is extremely popular in computer Go which determines each action by enormous simulations in a broad and deep search tree. However, human experts select most actions by pattern analysis and careful evaluation rather than brute search of millions of future nteractions. In this paper, we propose a computer Go system that follows experts way of thinking and playing. Our system consists of two parts. The first part is a novel deep alternative neural network (DANN) used to generate candidates of next move. Compared with existing deep convolutional neural network (DCNN), DANN inserts recurrent layer after each convolutional layer and stacks them in an alternative manner. We show such setting can preserve more contexts of local features and its evolutions which are beneficial for move prediction. The second part is a long-term evaluation (LTE) module used to provide a reliable evaluation of candidates rather than a single probability from move predictor. This is consistent with human experts nature of playing since they can foresee tens of steps to give an accurate estimation of candidates. In our system, for each candidate, LTE calculates a cumulative reward after several future interactions when local variations are settled. Combining criteria from the two parts, our system determines the optimal choice of next move. For more comprehensive experiments, we introduce a new professional Go dataset (PGD), consisting of 253233 professional records. Experiments on GoGoD and PGD datasets show the DANN can substantially improve performance of move prediction over pure DCNN. When combining LTE, our system outperforms most relevant approaches and open engines based on MCTS.Comment: AAAI 201

    Improved Reinforcement Learning with Curriculum

    Full text link
    Humans tend to learn complex abstract concepts faster if examples are presented in a structured manner. For instance, when learning how to play a board game, usually one of the first concepts learned is how the game ends, i.e. the actions that lead to a terminal state (win, lose or draw). The advantage of learning end-games first is that once the actions which lead to a terminal state are understood, it becomes possible to incrementally learn the consequences of actions that are further away from a terminal state - we call this an end-game-first curriculum. Currently the state-of-the-art machine learning player for general board games, AlphaZero by Google DeepMind, does not employ a structured training curriculum; instead learning from the entire game at all times. By employing an end-game-first training curriculum to train an AlphaZero inspired player, we empirically show that the rate of learning of an artificial player can be improved during the early stages of training when compared to a player not using a training curriculum.Comment: Draft prior to submission to IEEE Trans on Games. Changed paper slightl

    SAI, a Sensible Artificial Intelligence that plays Go

    Full text link
    We propose a multiple-komi modification of the AlphaGo Zero/Leela Zero paradigm. The winrate as a function of the komi is modeled with a two-parameters sigmoid function, so that the neural network must predict just one more variable to assess the winrate for all komi values. A second novel feature is that training is based on self-play games that occasionally branch -- with changed komi -- when the position is uneven. With this setting, reinforcement learning is showed to work on 7x7 Go, obtaining very strong playing agents. As a useful byproduct, the sigmoid parameters given by the network allow to estimate the score difference on the board, and to evaluate how much the game is decided.Comment: Updated for IJCNN 2019 conferenc
    corecore