48 research outputs found
Exemplar-Based Direct Policy Search with Evolutionary Optimization
In this paper, an exemplar-based policy optimization framework for direct policy search is presented. In this exemplar-based approach, the policy to be optimized is composed of a set of exemplars and a case-based action selector. An implementation of this approach using a state-action-based policy representation and an evolutionary algorithm optimizer is shown to provide favorable search performance for two higher-dimensional problems
Three types of forward pruning techniques to apply the alpha beta algorithm to turn-based strategy games
Turn-based strategy games are interesting testbeds for developing artificial players because their rules present developers with several challenges. Currently, Monte-Carlo tree search variants are often utilized to address these challenges. However, we consider it worthwhile introducing minimax search variants with pruning techniques because a turn-based strategy is in some points similar to the games of chess and Shogi, in which minimax variants are known to be effective. Thus, we introduced three forward-pruning techniques to enable us to apply alpha beta search (as a minimax search variant) to turn-based strategy games. This type of search involves fixing unit action orders, generating unit actions selectively, and limiting the number of moving units in a search. We applied our proposed pruning methods by implementing an alpha beta-based artificial player in the Turn-based strategy Academic Package (TUBSTAP) open platform of our institute. This player competed against first and second-rank players in the TUBSTAP AI competition in 2016. Our proposed player won against the other players in five different maps with an average winning ratio exceeding 70%
State Evaluation Strategy for Exemplar-Based Policy Optimization of Dynamic Decision Problems
Direct policy search (DPS) that optimizes the parameters of a decision making model, combined with evolutionary algorithms which enable robust optimization, is a promising approach to dynamic decision problems. Exemplar- based policy (EBP) optimization is a novel framework for DPS in which the policy is composed of a set of exemplars and a case-based action selector, with the set of exemplars being refined and evolved using a GA. In this paper, state evaluation type EBP representations are proposed for the problem class whose state transition can be predicted. For example, the vector-real representation defines pairs of feature vector and its desirability as exemplars, and evaluate the predicted next states using the exemplars. The state evaluation type EBP-based optimization procedures are shown to be superior to conventional state-action type EBP optimization through application to the Tetris game
High-performance Algorithms using Deep Learning in Turn-based Strategy Games
The development of AlphaGo has increased the interest of researchers in applying deep learning and reinforcement learning to games. However, using the AlphaZero algorithm on games with complex data structures and vast search space, such as turn-based strategy games, has some technical challenges. The problem involves performing complex data representations with neural networks, which results in a very long learning time. This study discusses methods that can accelerate the learning of neural networks by solving the problem of the data representation of neural networks using a search tree. The proposed algorithm performs better than existing methods such as the Monte Carlo Tree Search (MCTS). The automatic generation of learning data by self-play does not require a big learning database beforehand. Moreover, the algorithm also shows excellent match results with a win rate of more than 85% against the conventional algorithms in the new map which is not used for learning.12th International Conference on Agents and Artificial Intelligence (ICAART 2020), February 2020, Valletta, Malt
Generation of Diverse Stages in Turn-Based RPG using Reinforcement Learning
In this study, procedural content generation (PCG) using reinforcement learning (RL) is focused. PCG is defined as the generation of game content tailored to the defined evaluation function using RL models, which is one of the examples of PCG via machine learning. Compared to other generation content areas such as computer vision and natural language process, supervised learning generative methods such as variational autoencoders, PixelCNN, and generative adversarial networks exhibit some difficulties for applications to the game area because during the development of a new game, the content data used for training is typically not sufficient. Hence, RL is considered to be used as a method for PCG. In particular, the stage of turn-based RPG is selected as our research target because it comprises discrete sections, and its parameters were closely related; hence, it is a challenge to generate desirable stages, and the main goal is to generate various stages guided by the designed evaluation function. Two RL models, Deep Q-Network and Deep Deterministic Policy Gradient, respectively, are selected, and the generated stages are evaluated as 0.78 and 0.85 by our designed function, respectively. By the application of the stochastic noise policy, diverse stages are successfully obtained, and those diversities are evaluated by the parameter mse and the different number of valid strategies.IEEE CONFERENCE ON GAMES (COG), London, UK, 20th August 201
Production of Various Strategies and Position Control for Monte-Carlo Go- Entertaining human players
Abstract—Thanks to the continued development of tree search algorithms, of more precise evaluation functions, and of faster hardware, computer Go and computer Shogi have now reached a level of strength sufficient for most amateur players. However, the research about entertaining and coaching human players of board games is still very limited. In this paper, we try first to define what are the requirements for entertaining human players in computer board games. Then, we describe the different approaches that we have experimented in the case of Monte-Carlo computer Go. I
Playing Good-Quality Games with Weak Players by Combining Programs with Different Roles
Computer programs have become stronger than toprated human players in several games. However, weak players may not enjoy playing against these strong programs. In this study, we propose combining two programs with different roles to create programs suitable for weak players. We use a superhuman program that generates candidate moves and evaluates how good the moves are, as well as a program that evaluates the moves’ naturalness. We implement an instance for Go, which employs a superhuman program, KataGo, and a neural network trained using human games. Experiments show that the proposed method is promising for playing good-quality games with weak players.2022 IEEE Conference on Games (CoG 2022), 23rd August 202