694 research outputs found
Regular Boardgames
We propose a new General Game Playing (GGP) language called Regular
Boardgames (RBG), which is based on the theory of regular languages. The
objective of RBG is to join key properties as expressiveness, efficiency, and
naturalness of the description in one GGP formalism, compensating certain
drawbacks of the existing languages. This often makes RBG more suitable for
various research and practical developments in GGP. While dedicated mostly for
describing board games, RBG is universal for the class of all finite
deterministic turn-based games with perfect information. We establish
foundations of RBG, and analyze it theoretically and experimentally, focusing
on the efficiency of reasoning. Regular Boardgames is the first GGP language
that allows efficient encoding and playing games with complex rules and with
large branching factor (e.g.\ amazons, arimaa, large chess variants, go,
international checkers, paper soccer).Comment: AAAI 201
Dynamic Difficulty Adjustment
One of the challenges that a computer game developer faces when creating a new game is getting the difficulty right. Providing a game with an ability to automatically scale the difficulty depending on the current player would make the games more engaging over longer time. In this work we aim at a dynamic difficulty adjustment algorithm that can be used as a black box: universal, nonintrusive, and with guarantees on its performance. While there are a few commercial games that boast about having such a system, as well as a few published results on this topic, to the best of our knowledge none of them satisfy all three of these properties. On the way to our destination we first consider a game as an interaction between a player and her opponent. In this context, assuming their goals are mutually exclusive, difficulty adjustment consists of tuning the skill of the opponent to match the skill of the player. We propose a way to estimate the latter and adjust the former based on ranking the moves available to each player. Two sets of empirical experiments demonstrate the power, but also the limitations of this approach. Most importantly, the assumptions we make restrict the class of games it can be applied to. Looking for universality, we drop the constraints on the types of games we consider. We rely on the power of supervised learning and use the data collected from game testers to learn models of difficulty adjustment, as well as a mapping from game traces to models. Given a short game trace, the corresponding model tells the game what difficulty adjustment should be used. Using a self-developed game, we show that the predicted adjustments match players' preferences. The quality of the difficulty models depends on the quality of existing training data. The desire to dispense with the need for it leads us to the last approach. We propose a formalization of dynamic difficulty adjustment as a novel learning problem in the context of online learning and provide an algorithm to solve it, together with an upper bound on its performance. We show empirical results obtained in simulation and in two qualitatively different games with human participants. Due to its general nature, this algorithm can indeed be used as a black box for dynamic difficulty adjustment: It is applicable to any game with various difficulty states; it does not interfere with the player's experience; and it has a theoretical guarantee on how many mistakes it can possibly make
Pgx: Hardware-accelerated Parallel Game Simulators for Reinforcement Learning
We propose Pgx, a suite of board game reinforcement learning (RL)
environments written in JAX and optimized for GPU/TPU accelerators. By
leveraging auto-vectorization and Just-In-Time (JIT) compilation of JAX, Pgx
can efficiently scale to thousands of parallel executions over accelerators. In
our experiments on a DGX-A100 workstation, we discovered that Pgx can simulate
RL environments 10-100x faster than existing Python RL libraries. Pgx includes
RL environments commonly used as benchmarks in RL research, such as backgammon,
chess, shogi, and Go. Additionally, Pgx offers miniature game sets and baseline
models to facilitate rapid research cycles. We demonstrate the efficient
training of the Gumbel AlphaZero algorithm with Pgx environments. Overall, Pgx
provides high-performance environment simulators for researchers to accelerate
their RL experiments. Pgx is available at https://github.com/sotetsuk/pgx.Comment: 9 page
A Survey of Monte Carlo Tree Search Methods
Monte Carlo tree search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarize the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work
CH-Go: Online Go System Based on Chunk Data Storage
The training and running of an online Go system require the support of
effective data management systems to deal with vast data, such as the initial
Go game records, the feature data set obtained by representation learning, the
experience data set of self-play, the randomly sampled Monte Carlo tree, and so
on. Previous work has rarely mentioned this problem, but the ability and
efficiency of data management systems determine the accuracy and speed of the
Go system. To tackle this issue, we propose an online Go game system based on
the chunk data storage method (CH-Go), which processes the format of 160k Go
game data released by Kiseido Go Server (KGS) and designs a Go encoder with 11
planes, a parallel processor and generator for better memory performance.
Specifically, we store the data in chunks, take the chunk size of 1024 as a
batch, and save the features and labels of each chunk as binary files. Then a
small set of data is randomly sampled each time for the neural network
training, which is accessed by batch through yield method. The training part of
the prototype includes three modules: supervised learning module, reinforcement
learning module, and an online module. Firstly, we apply Zobrist-guided hash
coding to speed up the Go board construction. Then we train a supervised
learning policy network to initialize the self-play for generation of
experience data with 160k Go game data released by KGS. Finally, we conduct
reinforcement learning based on REINFORCE algorithm. Experiments show that the
training accuracy of CH- Go in the sampled 150 games is 99.14%, and the
accuracy in the test set is as high as 98.82%. Under the condition of limited
local computing power and time, we have achieved a better level of
intelligence. Given the current situation that classical systems such as GOLAXY
are not free and open, CH-Go has realized and maintained complete Internet
openness.Comment: The 8th International Conference on Data Science and System
- …