Search CORE

10 research outputs found

Enhancing the Monte Carlo Tree Search Algorithm for Video Game Testing

Author: Ariyurek Sinan
Betin-Can Aysu
Surer Elif
Publication venue
Publication date: 01/03/2020
Field of study

In this paper, we study the effects of several Monte Carlo Tree Search (MCTS) modifications for video game testing. Although MCTS modifications are highly studied in game playing, their impacts on finding bugs are blank. We focused on bug finding in our previous study where we introduced synthetic and human-like test goals and we used these test goals in Sarsa and MCTS agents to find bugs. In this study, we extend the MCTS agent with several modifications for game testing purposes. Furthermore, we present a novel tree reuse strategy. We experiment with these modifications by testing them on three testbed games, four levels each, that contain 45 bugs in total. We use the General Video Game Artificial Intelligence (GVG-AI) framework to create the testbed games and collect 427 human tester trajectories using the GVG-AI framework. We analyze the proposed modifications in three parts: we evaluate their effects on bug finding performances of agents, we measure their success under two different computational budgets, and we assess their effects on human-likeness of the human-like agent. Our results show that MCTS modifications improve the bug finding performance of the agents

arXiv.org e-Print Archive

Crossref

OpenMETU (Middle East Technical University)

Fast deep reinforcement learning using online adjustments from the past

Author: Hansen Steven
Sprechmann Pablo
Pritzel Alexander
Barreto André
Blundell Charles
Publication venue
Publication date: 01/01/2006
Field of study

We propose Ephemeral Value Adjusments (EVA): a means of allowing deep reinforcement learning agents to rapidly adapt to experience in their replay buffer. EVA shifts the value predicted by a neural network with an estimate of the value function found by planning over experience tuples from the replay buffer near the current state. EVA combines a number of recent ideas around combining episodic memory-like structures into reinforcement learning agents: slot-based storage, content-based retrieval, and memory-based planning. We show that EVAis performant on a demonstration task and Atari games.Comment: Accepted at NIPS 201

arXiv.org e-Print Archive

DigitalCommons@ILR

eCommons@Cornell

ME-MCTS: Online generalization by combining multiple value estimators

Author: Baier H.J.S. (Hendrik)
Kaisers M. (Michael)
Publication venue
Publication date: 19/08/2021
Field of study

This paper addresses the challenge of online gen- eralization in tree search. We propose Multiple Estimator Monte Carlo Tree Search (ME-MCTS), with a two-fold contribution: first, we introduce a formalization of online generalization that can rep- resent existing techniques such as “history heuris- tics”, “RAVE”, or “OMA” – contextual action value estimators or abstractors that generalize across spe- cific contexts. Second, we incorporate recent ad- vances in estimator averaging that enable guiding search by combining the online action value esti- mates of any number of such abstractors or sim- ilar types of action value estimators. Unlike pre- vious work, which usually proposed a single ab- stractor for either the selection or the rollout phase of MCTS simulations, our approach focuses on the combination of multiple estimators and applies them to all move choices in MCTS simulations. As the MCTS tree itself is just another value estima- tor – unbiased, but without abstraction – this blurs the traditional distinction between action choices inside and outside of the MCTS tree. Experi- ments with three abstractors in four board games show significant improvements of ME-MCTS over MCTS using only a single abstractor, both for MCTS with random rollouts as well as for MCTS with static evaluation functions. While we used deterministic, fully observable games, ME-MCTS naturally extends to more challenging settings

CWI's Institutional Repository

An Integrated Framework Integrating Monte Carlo Tree Search and Supervised Learning for Train Timetabling Problem

Author: Yang Feiyu
Publication venue
Publication date: 01/11/2023
Field of study

The single-track railway train timetabling problem (TTP) is an important and complex problem. This article proposes an integrated Monte Carlo Tree Search (MCTS) computing framework that combines heuristic methods, unsupervised learning methods, and supervised learning methods for solving TTP in discrete action spaces. This article first describes the mathematical model and simulation system dynamics of TTP, analyzes the characteristics of the solution from the perspective of MCTS, and proposes some heuristic methods to improve MCTS. This article considers these methods as planners in the proposed framework. Secondly, this article utilizes deep convolutional neural networks to approximate the value of nodes and further applies them to the MCTS search process, referred to as learners. The experiment shows that the proposed heuristic MCTS method is beneficial for solving TTP; The algorithm framework that integrates planners and learners can improve the data efficiency of solving TTP; The proposed method provides a new paradigm for solving TTP

arXiv.org e-Print Archive

Decision making in an uncertain world

Author: Phay John E.
University of Mississippi. Bureau of Educational Research
Publication venue: 'University of Queensland Library'
Publication date: 14/03/2019
Field of study

Campus Scene; Undatedhttps://egrove.olemiss.edu/phay_laf/1508/thumbnail.jp

eGrove (Univ. of Mississippi)

University of Queensland eSpace

Decision making in an uncertain world

Author: Wang Erli
Publication venue: 'University of Queensland Library'
Publication date: 02/08/2019
Field of study

University of Queensland eSpace

Graph-Based Mapping for Knowledge Transfer in General Game Playing

Author: Jung Joshua
Publication venue: 'University of Waterloo'
Publication date: 11/01/2024
Field of study

General game playing (GGP) is a field of reinforcement learning (RL) in which the rules of a game (i.e. the state and dynamics of an RL domain) are not specified until runtime. A GGP agent must therefore be able to play any possible game at an acceptable level given an initialization time on the order of seconds. This time restriction promotes generality, precludes the use of the deep learning methods that are popular in the RL literature, and has led to the widespread use of Monte Carlo Tree Search (MCTS) as a planning strategy. A typical MCTS planner builds a search tree from scratch for every new game, but this leaves usable information on the table. Over its full history of play, an agent may have previously encountered a similar game from which it could draw insights into its current challenge. However, recognizing similarity between games and effectively transferring knowledge from past experience is a non-trivial task. In this thesis, we develop methods for automatically identifying similar features in two related games by finding an approximated edit distance between the graphs generated from their rules. We use that information to guide MCTS in one game with general heuristics initialized via transfer from a previously played game. Despite the computational cost of doing so, we show that the more efficient search granted by this approach can lead to better performance than either UCT (a standard method of MCTS) or a non-transfer MCTS agent with access to the same general heuristics. We examine the circumstances under which transfer is most effective, and also identify and create solutions for the cases where it is not

University of Waterloo's Institutional Repository

Memory-Augmented Monte Carlo Tree Search

Author: Mei Jincheng
Müller Martin
Xiao Chenjun
Publication venue: 'Association for the Advancement of Artificial Intelligence (AAAI)'
Publication date: 25/04/2018
Field of study

This paper proposes and evaluates Memory-Augmented Monte Carlo Tree Search (M-MCTS), which provides a new approach to exploit generalization in online real-time search. The key idea of M-MCTS is to incorporate MCTS with a memory structure, where each entry contains information of a particular state. This memory is used to generate an approximate value estimation by combining the estimations of similar states. We show that the memory based value approximation is better than the vanilla Monte Carlo estimation with high probability under mild conditions. We evaluate M-MCTS in the game of Go. Experimental results show that M-MCTS outperforms the original MCTS with the same number of simulations

Association for the Advancement of Artificial Intelligence: AAAI Publications