146,937 research outputs found
Swarm intelligence in cooperative environments: n-step dynamic tree search algorithm overview
Reinforcement learning tree-based planning methods have been gaining popularity in the last few years due to their success in single-agent domains, where a perfect simulator model is available: for example, Go and chess strategic board games. This paper pretends to extend tree search algorithms to the multiagent setting in a decentralized structure, dealing with scalability issues and exponential growth of computational resources. The n-step dynamic tree search combines forward planning and direct temporal-difference updates, outperforming markedly conventional tabular algorithms such as Q learning and state-action-reward-state-action (SARSA). Future state transitions and rewards are predicted with a model built and learned from real interactions between agents and the environment. This paper analyzes the developed algorithm in the hunter–pursuit cooperative game against stochastic and intelligent evaders. The n-step dynamic tree search aims to adapt single-agent tree search learning methods to the multiagent boundaries and is demonstrated to be a remarkable advance as compared to conventional temporal-difference techniques
Swarm intelligence in cooperative environments: N-step dynamic tree search algorithm extended analysis
Reinforcement learning tree-based planning methods have been gaining popularity in the last few years due to their success in single-agent domains, where a perfect simulator model is available, e.g., Go and chess strategic board games. This paper pretends to extend tree search algorithms to the multi-agent setting in a decentralized structure, dealing with scalability issues and exponential growth of computational resources. The N-Step Dynamic Tree Search combines forward planning and direct temporal-difference updates, outperforming markedly state-of-the-art algorithms such as Q-Learning and SARSA. Future state transitions and rewards are predicted with a model built and learned from real interactions between agents and the environment. As an extension of previous work, this paper analyses the developed algorithm in the Hunter-Pursuit cooperative game against intelligent evaders. The N-Step Dynamic Tree Search aims to adapt the most successful single-agent learning methods to the multi-agent boundaries and demonstrates to be a remarkable advance compared to conventional temporal-difference techniques.Engineering and Physical Sciences Research Council (EPSRC): 2454254.
BAE System
Recommended from our members
Competitive multi-agent search
textWhile evolutionary computation is well suited for automatic discovery in engineering, it can also be used to gain insight into how humans and organizations could perform more effectively. Using a real-world problem of innovation search in organizations as the motivating example, this dissertation formalizes human creative problem solving as competitive multi-agent search. It differs from existing single-agent and team-search problems in that the agents interact through knowledge of other agents' searches and through the dynamic changes in the search landscape caused by these searches. The main hypothesis is that evolutionary computation can be used to discover effective strategies for competitive multi-agent search. This hypothesis is verified in experiments using an abstract domain based on the NK model, i.e. partially correlated and tunably rugged fitness landscapes, and a concrete domain in the form of a social innovation game. In both domains, different specialized strategies are evolved for each different competitive environment, and also strategies that generalize across environments. Strategies evolved in the abstract domain are more effective and more complex than hand-designed strategies and one based on traditional tree search. Using a novel spherical visualization of the fitness landscapes of the abstract domain, insight is gained about how successful strategies work, e.g. by tracking positive changes in the landscape. In the concrete game domain, human players were modeled using backpropagation, and used as opponents to create environments for evolution. Evolved strategies scored significantly higher than the human models by using a different proportion of actions, providing insights into how performance could be improved in social innovation domains. The work thus provides a possible framework for studying various human creative activities as competitive multi-agent search in the future.Computer Science
Self-adaptive MCTS for General Video Game Playing
Monte-carlo tree search (mcts) has shown particular success in general game playing (ggp) and general video game playing (gvgp) and many enhancements and variants have been developed. Recently, an on-line adaptive parameter tuning mechanism for mcts agents has been proposed that almost achieves the same performance as off-line tuning in ggp.in this paper we apply the same approach to gvgp and use the popular general video game ai (gvgai) framework, in which the time allowed to make a decision is only 40 ms. We design three self-adaptive mcts (sa-mcts) agents that optimize on-line the parameters of a standard non-self-adaptive mcts agent of gvgai. The three agents select the parameter values using naïve monte-carlo, an evolutionary algorithm and an n-tuple bandit evolutionary algorithm respectively, and are tested on 20 single-player games of gvgai.the sa-mcts agents achieve more robust results on the tested games. With the same time setting, they perform similarly to the baseline standard mcts agent in the games for which the baseline agent performs well, and significantly improve the win rate in the games for which the baseline agent performs poorly. As validation, we also test the performance of non-self-adaptive mcts instances that use the most sampled parameter settings during the on-line tuning of each of the three sa-mcts agents for each game. Results show that these parameter settings improve the win rate on the games wait for breakfast and escape by 4 times and 150 times, respectively
The effect of simulation bias on action selection in Monte Carlo Tree Search
A dissertation submitted to the Faculty of Science, University of the Witwatersrand,
in fulfilment of the requirements for the degree of Master of Science. August 2016.Monte Carlo Tree Search (MCTS) is a family of directed search algorithms that has gained widespread
attention in recent years. It combines a traditional tree-search approach with Monte Carlo
simulations, using the outcome of these simulations (also known as playouts or rollouts) to evaluate
states in a look-ahead tree. That MCTS does not require an evaluation function makes it particularly
well-suited to the game of Go — seen by many to be chess’s successor as a grand challenge of
artificial intelligence — with MCTS-based agents recently able to achieve expert-level play on
19×19 boards. Furthermore, its domain-independent nature also makes it a focus in a variety of
other fields, such as Bayesian reinforcement learning and general game-playing.
Despite the vast amount of research into MCTS, the dynamics of the algorithm are still not
yet fully understood. In particular, the effect of using knowledge-heavy or biased simulations in
MCTS still remains unknown, with interesting results indicating that better-informed rollouts do
not necessarily result in stronger agents. This research provides support for the notion that MCTS
is well-suited to a class of domain possessing a smoothness property. In these domains, biased
rollouts are more likely to produce strong agents. Conversely, any error due to incorrect bias
is compounded in non-smooth domains, and in particular for low-variance simulations. This is
demonstrated empirically in a number of single-agent domains.LG201
Fast Approximate Max-n Monte Carlo Tree Search for Ms Pac-Man
We present an application of Monte Carlo tree search (MCTS) for the game of Ms Pac-Man. Contrary to most applications of MCTS to date, Ms Pac-Man requires almost real-time decision making and does not have a natural end state. We approached the problem by performing Monte Carlo tree searches on a five player maxn tree representation of the game with limited tree search depth. We performed a number of experiments using both the MCTS game agents (for pacman and ghosts) and agents used in previous work (for ghosts). Performance-wise, our approach gets excellent scores, outperforming previous non-MCTS opponent approaches to the game by up to two orders of magnitude. © 2011 IEEE
Shallow decision-making analysis in General Video Game Playing
The General Video Game AI competitions have been the testing ground for
several techniques for game playing, such as evolutionary computation
techniques, tree search algorithms, hyper heuristic based or knowledge based
algorithms. So far the metrics used to evaluate the performance of agents have
been win ratio, game score and length of games. In this paper we provide a
wider set of metrics and a comparison method for evaluating and comparing
agents. The metrics and the comparison method give shallow introspection into
the agent's decision making process and they can be applied to any agent
regardless of its algorithmic nature. In this work, the metrics and the
comparison method are used to measure the impact of the terms that compose a
tree policy of an MCTS based agent, comparing with several baseline agents. The
results clearly show how promising such general approach is and how it can be
useful to understand the behaviour of an AI agent, in particular, how the
comparison with baseline agents can help understanding the shape of the agent
decision landscape. The presented metrics and comparison method represent a
step toward to more descriptive ways of logging and analysing agent's
behaviours
- …