109 research outputs found
Opponent modelling in the game of tron using reinforcement learning
In this paper we propose the use of vision grids as state representation to learn to play the game Tron using neural networks and reinforcement learning. This approach speeds up learning by significantly reducing the number of unique states. Furthermore, we introduce a novel opponent modelling technique, which is used to predict the opponent’s next move. The learned model of the opponent is subsequently used in Monte-Carlo roll-outs, in which the game is simulated n-steps ahead in order to determine the expected value of conducting a certain action. Finally, we compare the performance using two different activation functions in the multi-layer perceptron, namely the sigmoid and exponential linear unit (Elu). The results show that the Elu activation function outperforms the sigmoid activation function in most cases. Furthermore, vision grids significantly increase learning speed and in most cases this also increases the agent’s performance compared to when the full grid is used as state representation. Finally, the opponent modelling technique allows the agent to learn a predictive model of the opponent’s actions, which in combination with Monte-Carlo roll-outs significantly increases the agent’s performance
Learning from Monte Carlo Rollouts with Opponent Models for Playing Tron
This paper describes a novel reinforcement learning system for learning to play the game of Tron. The system combines Q-learning, multi-layer perceptrons, vision grids, opponent modelling, and Monte Carlo rollouts in a novel way. By learning an opponent model, Monte Carlo rollouts can be effectively applied to generate state trajectories for all possible actions from which improved action estimates can be computed. This allows to extend experience replay by making it possible to update the state-action values of all actions in a given game state simultaneously. The results show that the use of experience replay that updates the Q-values of all actions simultaneously strongly outperforms the conventional experience replay that only updates the Q-value of the performed action. The results also show that using short or long rollout horizons during training lead to similar good performances against two fixed opponents
A Survey of Monte Carlo Tree Search Methods
Monte Carlo tree search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarize the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work
Sampled Policy Gradient for Learning to Play the Game Agar.io
In this paper, a new offline actor-critic learning algorithm is introduced:
Sampled Policy Gradient (SPG). SPG samples in the action space to calculate an
approximated policy gradient by using the critic to evaluate the samples. This
sampling allows SPG to search the action-Q-value space more globally than
deterministic policy gradient (DPG), enabling it to theoretically avoid more
local optima. SPG is compared to Q-learning and the actor-critic algorithms
CACLA and DPG in a pellet collection task and a self play environment in the
game Agar.io. The online game Agar.io has become massively popular on the
internet due to intuitive game design and the ability to instantly compete
against players around the world. From the point of view of artificial
intelligence this game is also very intriguing: The game has a continuous input
and action space and allows to have diverse agents with complex strategies
compete against each other. The experimental results show that Q-Learning and
CACLA outperform a pre-programmed greedy bot in the pellet collection task, but
all algorithms fail to outperform this bot in a fighting scenario. The SPG
algorithm is analyzed to have great extendability through offline exploration
and it matches DPG in performance even in its basic form without extensive
sampling
Programming Robosoccer agents by modelling human behavior
The Robosoccer simulator is a challenging environment for artificial intelligence, where a human has to program a team of agents and introduce it into a soccer virtual environment. Most usually, Robosoccer agents are programmed by hand. In some cases, agents make use of Machine learning (ML) to adapt and predict the behavior of the opposite team, but the bulk of the agent has been preprogrammed. The main aim of this paper is to transform Robosoccer into an interactive game and let a human control a Robosoccer agent. Then ML techniques can be used to model his/her behavior from training instances generated during the play. This model will be used later to control a Robosoccer agent, thus imitating the human behavior. We have focused our research on low-level behavior, like looking for the ball, conducting the ball towards the goal, or scoring in the presence of opponent players. Results have shown that indeed, Robosoccer agents can be controlled by programs that model human play.Publicad
Opponent Learning Awareness and Modelling in Multi-Objective Normal Form Games
Many real-world multi-agent interactions consider multiple distinct criteria,
i.e. the payoffs are multi-objective in nature. However, the same
multi-objective payoff vector may lead to different utilities for each
participant. Therefore, it is essential for an agent to learn about the
behaviour of other agents in the system. In this work, we present the first
study of the effects of such opponent modelling on multi-objective multi-agent
interactions with non-linear utilities. Specifically, we consider two-player
multi-objective normal form games with non-linear utility functions under the
scalarised expected returns optimisation criterion. We contribute novel
actor-critic and policy gradient formulations to allow reinforcement learning
of mixed strategies in this setting, along with extensions that incorporate
opponent policy reconstruction and learning with opponent learning awareness
(i.e., learning while considering the impact of one's policy when anticipating
the opponent's learning step). Empirical results in five different MONFGs
demonstrate that opponent learning awareness and modelling can drastically
alter the learning dynamics in this setting. When equilibria are present,
opponent modelling can confer significant benefits on agents that implement it.
When there are no Nash equilibria, opponent learning awareness and modelling
allows agents to still converge to meaningful solutions that approximate
equilibria.Comment: Under review since 14 November 202
Correcting and improving imitation models of humans for Robosoccer agents
Proceeding of: 2005 IEEE Congress on Evolutionary Computation (CEC'05),Edimburgo, 2-5 Sept. 2005The Robosoccer simulator is a challenging environment, where a human introduces a team of agents into a football virtual environment. Typically, agents are programmed by hand, but it would be a great advantage to transfer human experience into football agents. The first aim of this paper is to use machine learning techniques to obtain models of humans playing Robosoccer. These models can be used later to control a Robosoccer agent. However, models did not play as smoothly and optimally as the human. To solve this problem, the second goal of this paper is to incrementally correct models by means of evolutionary techniques, and to adapt them against more difficult opponents than the ones beatable by the human.Publicad
- …