8 research outputs found

    Automated Video Game Testing Using Synthetic and Human-Like Agents

    Get PDF
    In this paper, we present a new methodology that employs tester agents to automate video game testing. We introduce two types of agents -synthetic and human-like- and two distinct approaches to create them. Our agents are derived from Reinforcement Learning (RL) and Monte Carlo Tree Search (MCTS) agents, but focus on finding defects. The synthetic agent uses test goals generated from game scenarios, and these goals are further modified to examine the effects of unintended game transitions. The human-like agent uses test goals extracted by our proposed multiple greedy-policy inverse reinforcement learning (MGP-IRL) algorithm from tester trajectories. MGPIRL captures multiple policies executed by human testers. These testers' aims are finding defects while interacting with the game to break it, which is considerably different from game playing. We present interaction states to model such interactions. We use our agents to produce test sequences, run the game with these sequences, and check the game for each run with an automated test oracle. We analyze the proposed method in two parts: we compare the success of human-like and synthetic agents in bug finding, and we evaluate the similarity between humanlike agents and human testers. We collected 427 trajectories from human testers using the General Video Game Artificial Intelligence (GVG-AI) framework and created three games with 12 levels that contain 45 bugs. Our experiments reveal that human-like and synthetic agents compete with human testers' bug finding performances. Moreover, we show that MGP-IRL increases the human-likeness of agents while improving the bug finding performance

    Optimizing exploration parameter in dueling deep Q-networks for complex gaming environment

    Get PDF
    Reinforcement Learning is being used to solve various tasks. A Complex Environment is a recent problem at hand for Reinforcement Learning, which employs an Agent who interacts with the surroundings and learns to solve whatever task has to be done. To solve a Complex Environment efficiently using a Reinforcement Learning Agent, a lot of parameters are to be kept in perspective. Every action that the Agent takes has a consequence in the form of a Reward Function. Based on the value of this Reward Function, our Agent develops a Policy to solve the Environment. The Policy is generally developed to maximize the Reward Functions. The Optimal Policy employs an Exploration Strategy which is used by the Agent. Reinforcement Learning Architectures are relying on the Policy and Exploration Strategy of the Agent to solve the Environment efficiently. This research is based upon two parts. Firstly, the optimization of a Deep Reinforcement Learning Architecture “Dueling Deep Q-Network” is conducted by improving its Exploration strategy. It combines a recent and novel Exploration technique, Curiosity Driven Intrinsic Motivation, with the Dueling DQN. The performance of this Curious Dueling DQN is checked by comparing it with the existing Dueling DQN. Secondly, the performance of the Curious Dueling DQN is validated against Noisy Dueling DQN, a combination of Dueling DQN with another recent exploration strategy called Noisy Nets, hence, finding an optimal exploration strategy. The performance of both solutions is evaluated in the environment of Super Mario Bros based on Mean Score and Estimation Loss. The proposed model improves the Mean Score by 3 folds, while the loss is increased by 28%

    Agentes com aprendizagem automática para jogos de computador

    Get PDF
    In recent years, new Reinforcement Learning algorithms have been developed. These algorithms use Deep Neural Networks to represent the agent’s knowledge. After surpassing previous Artificial Intelligence (AI) milestones, such as Chess and Go, these Deep Reinforcement Learning (DRL) methods were able to surpass the human level in very complex games like Dota 2, where long-term planning is required and in which professional teams of human players train daily to win e-sports competitions. These algorithms start from scratch, do not use examples of human behavior, and can be applied in various domains. Learning from experience, new and better behaviors were discovered, indicating a lot of potential in these algorithms. However, they require a lot of computational power and training time. Computer games are used in an AI course at the University of Aveiro as an application domain of the AI knowledge acquired by students. The students should develop software agents for these games and try to get the best scores. The objective of this dissertation is to develop agents using the latest DRL techniques and to compare their performance with the agents developed by students. To begin with, DRL agents were developed for a simpler game like Tic-Tac-Toe, where various learning options will be addressed until a robust agent capable of playing against multiple opponents is created. Then, DRL agents capable of playing the version of Pac-Man used in the University of Aveiro course, in the 2018/19 academic year, were developed through the realization of various experiments where the parameters used in the learning process were modified in order to obtain better scores. The developed agent, that obtained the best score, is able to play in all game configurations used in the evaluation of the course and reached the top 7 ranking, among more than 50 agents developed by students that used hard-coded strategies with pathfinding algorithms.Nos últimos anos, novos algoritmos de Aprendizagem por Reforço foram desenvolvidos. Estes algoritmos usam Redes Neuronais Profundas para representar o conhecimento do agente. Após ultrapassarem marcos anteriores da Inteligência Artificial (AI), como o Xadrez e o Go, esses métodos de Aprendizagem Profunda por Reforço (DRL) foram capazes de superar o nível humano em jogos muito complexos como o Dota 2, onde é necessário um planeamento a longo prazo e nos quais equipas profissionais de jogadores humanos treinam diariamente para ganhar competições de desportos eletrónicos. Estes algoritmos começam do zero, não usam exemplos de comportamento humano e podem ser aplicados em vários domínios. Aprendendo pela experiência, novos e melhores comportamentos foram descobertos, indicando um grande potencial nestes algoritmos. No entanto, eles exigem muito poder computacional e tempo de treino. Os jogos de computador são utilizados numa disciplina de AI da Universidade de Aveiro como domínio de aplicação dos conhecimentos de AI adquiridos pelos alunos. Os alunos devem desenvolver agentes de software para esses jogos e tentar obter as melhores pontuações. O objetivo desta dissertação é desenvolver agentes usando as mais recentes técnicas de DRL e comparar o seu desempenho com o dos agentes desenvolvidos pelos alunos. Para começar, os agentes com DRL foram desenvolvidos para um jogo mais simples como o Jogo do Galo, onde várias opções de aprendizagem foram abordadas até ser criado um agente robusto capaz de jogar contra vários oponentes. Posteriormente, foram desenvolvidos agentes com DRL capazes de jogar a versão do Pac-Man utilizada na disciplina da Universidade de Aveiro, no ano letivo de 2018/19, através da realização de diversas experiências onde os parâmetros utilizados no processo de aprendizagem foram modificados de forma a obter melhores pontuações. O agente desenvolvido, que obteve a melhor pontuação, consegue jogar em todas as configurações de jogo utilizadas na avaliação da disciplina e alcançou o top 7 das classificações, entre mais de 50 agentes desenvolvidos por alunos que utilizaram estratégias embutidas no código com algoritmos de pesquisa.Mestrado em Engenharia Informátic

    Exploration methods for connectionist Q-learning in bomberman

    Get PDF
    In this paper, we investigate which exploration method yields the best performance in the game Bomberman. In Bomberman the controlled agent has to kill opponents by placing bombs. The agent is represented by a multi-layer perceptron that learns to play the game with the use of Q-learning. We introduce two novel exploration strategies: Error-Driven-ε and Interval-Q, which base their explorative behavior on the temporal-difference error of Q-learning. The learning capabilities of these exploration strategies are compared to five existing methods: Random-Walk, Greedy, ε-Greedy, Diminishing ε-Greedy, and Max-Boltzmann. The results show that the methods that combine exploration with exploitation perform much better than the Random-Walk and Greedy strategies, which only select exploration or exploitation actions. Furthermore, the results show that Max-Boltzmann exploration performs the best in overall from the different techniques. The Error-Driven-ε exploration strategy also performs very well, but suffers from an unstable learning behavior
    corecore