8 research outputs found
Automated Video Game Testing Using Synthetic and Human-Like Agents
In this paper, we present a new methodology that employs tester agents to
automate video game testing. We introduce two types of agents -synthetic and
human-like- and two distinct approaches to create them. Our agents are derived
from Reinforcement Learning (RL) and Monte Carlo Tree Search (MCTS) agents, but
focus on finding defects. The synthetic agent uses test goals generated from
game scenarios, and these goals are further modified to examine the effects of
unintended game transitions. The human-like agent uses test goals extracted by
our proposed multiple greedy-policy inverse reinforcement learning (MGP-IRL)
algorithm from tester trajectories. MGPIRL captures multiple policies executed
by human testers. These testers' aims are finding defects while interacting
with the game to break it, which is considerably different from game playing.
We present interaction states to model such interactions. We use our agents to
produce test sequences, run the game with these sequences, and check the game
for each run with an automated test oracle. We analyze the proposed method in
two parts: we compare the success of human-like and synthetic agents in bug
finding, and we evaluate the similarity between humanlike agents and human
testers. We collected 427 trajectories from human testers using the General
Video Game Artificial Intelligence (GVG-AI) framework and created three games
with 12 levels that contain 45 bugs. Our experiments reveal that human-like and
synthetic agents compete with human testers' bug finding performances.
Moreover, we show that MGP-IRL increases the human-likeness of agents while
improving the bug finding performance
Optimizing exploration parameter in dueling deep Q-networks for complex gaming environment
Reinforcement Learning is being used to solve various tasks. A Complex Environment is a recent problem at hand for Reinforcement Learning, which employs an Agent who interacts with the surroundings and learns to solve whatever task has to be done. To solve a Complex Environment efficiently using a Reinforcement Learning Agent, a lot of parameters are to be kept in perspective. Every action that the Agent takes has a consequence in the form of a Reward Function. Based on the value of this Reward Function, our Agent develops a Policy to solve the Environment. The Policy is generally developed to maximize the Reward Functions. The Optimal Policy employs an Exploration Strategy which is used by the Agent. Reinforcement Learning Architectures are relying on the Policy and Exploration Strategy of the Agent to solve the Environment efficiently. This research is based upon two parts. Firstly, the optimization of a Deep Reinforcement Learning Architecture “Dueling Deep Q-Network” is conducted by improving its Exploration strategy. It combines a recent and novel Exploration technique, Curiosity Driven Intrinsic Motivation, with the Dueling DQN. The performance of this Curious Dueling DQN is checked by comparing it with the existing Dueling DQN. Secondly, the performance of the Curious Dueling DQN is validated against Noisy Dueling DQN, a combination of Dueling DQN with another recent exploration strategy called Noisy Nets, hence, finding an optimal exploration strategy. The performance of both solutions is evaluated in the environment of Super Mario Bros based on Mean Score and Estimation Loss. The proposed model improves the Mean Score by 3 folds, while the loss is increased by 28%
Agentes com aprendizagem automática para jogos de computador
In recent years, new Reinforcement Learning algorithms have been developed.
These algorithms use Deep Neural Networks to represent the agent’s knowledge.
After surpassing previous Artificial Intelligence (AI) milestones, such as Chess and
Go, these Deep Reinforcement Learning (DRL) methods were able to surpass the
human level in very complex games like Dota 2, where long-term planning is required
and in which professional teams of human players train daily to win e-sports
competitions. These algorithms start from scratch, do not use examples of human
behavior, and can be applied in various domains. Learning from experience, new
and better behaviors were discovered, indicating a lot of potential in these algorithms.
However, they require a lot of computational power and training time.
Computer games are used in an AI course at the University of Aveiro as an application
domain of the AI knowledge acquired by students. The students should
develop software agents for these games and try to get the best scores. The objective
of this dissertation is to develop agents using the latest DRL techniques and
to compare their performance with the agents developed by students.
To begin with, DRL agents were developed for a simpler game like Tic-Tac-Toe,
where various learning options will be addressed until a robust agent capable of
playing against multiple opponents is created.
Then, DRL agents capable of playing the version of Pac-Man used in the University
of Aveiro course, in the 2018/19 academic year, were developed through the realization
of various experiments where the parameters used in the learning process
were modified in order to obtain better scores.
The developed agent, that obtained the best score, is able to play in all game
configurations used in the evaluation of the course and reached the top 7 ranking,
among more than 50 agents developed by students that used hard-coded strategies
with pathfinding algorithms.Nos últimos anos, novos algoritmos de Aprendizagem por Reforço foram desenvolvidos.
Estes algoritmos usam Redes Neuronais Profundas para representar o
conhecimento do agente. Após ultrapassarem marcos anteriores da Inteligência
Artificial (AI), como o Xadrez e o Go, esses métodos de Aprendizagem Profunda
por Reforço (DRL) foram capazes de superar o nível humano em jogos muito complexos
como o Dota 2, onde é necessário um planeamento a longo prazo e nos quais
equipas profissionais de jogadores humanos treinam diariamente para ganhar competições
de desportos eletrónicos. Estes algoritmos começam do zero, não usam
exemplos de comportamento humano e podem ser aplicados em vários domínios.
Aprendendo pela experiência, novos e melhores comportamentos foram descobertos,
indicando um grande potencial nestes algoritmos. No entanto, eles exigem
muito poder computacional e tempo de treino.
Os jogos de computador são utilizados numa disciplina de AI da Universidade de
Aveiro como domínio de aplicação dos conhecimentos de AI adquiridos pelos alunos.
Os alunos devem desenvolver agentes de software para esses jogos e tentar
obter as melhores pontuações. O objetivo desta dissertação é desenvolver agentes
usando as mais recentes técnicas de DRL e comparar o seu desempenho com o dos
agentes desenvolvidos pelos alunos.
Para começar, os agentes com DRL foram desenvolvidos para um jogo mais simples
como o Jogo do Galo, onde várias opções de aprendizagem foram abordadas até
ser criado um agente robusto capaz de jogar contra vários oponentes.
Posteriormente, foram desenvolvidos agentes com DRL capazes de jogar a versão
do Pac-Man utilizada na disciplina da Universidade de Aveiro, no ano letivo de
2018/19, através da realização de diversas experiências onde os parâmetros utilizados
no processo de aprendizagem foram modificados de forma a obter melhores
pontuações.
O agente desenvolvido, que obteve a melhor pontuação, consegue jogar em todas
as configurações de jogo utilizadas na avaliação da disciplina e alcançou o top 7 das
classificações, entre mais de 50 agentes desenvolvidos por alunos que utilizaram
estratégias embutidas no código com algoritmos de pesquisa.Mestrado em Engenharia Informátic
Exploration methods for connectionist Q-learning in bomberman
In this paper, we investigate which exploration method yields the best performance in the game Bomberman. In Bomberman the controlled agent has to kill opponents by placing bombs. The agent is represented by a multi-layer perceptron that learns to play the game with the use of Q-learning. We introduce two novel exploration strategies: Error-Driven-ε and Interval-Q, which base their explorative behavior on the temporal-difference error of Q-learning. The learning capabilities of these exploration strategies are compared to five existing methods: Random-Walk, Greedy, ε-Greedy, Diminishing ε-Greedy, and Max-Boltzmann. The results show that the methods that combine exploration with exploitation perform much better than the Random-Walk and Greedy strategies, which only select exploration or exploitation actions. Furthermore, the results show that Max-Boltzmann exploration performs the best in overall from the different techniques. The Error-Driven-ε exploration strategy also performs very well, but suffers from an unstable learning behavior