13 research outputs found

    Action Guidance with MCTS for Deep Reinforcement Learning

    Full text link
    Deep reinforcement learning has achieved great successes in recent years, however, one main challenge is the sample inefficiency. In this paper, we focus on how to use action guidance by means of a non-expert demonstrator to improve sample efficiency in a domain with sparse, delayed, and possibly deceptive rewards: the recently-proposed multi-agent benchmark of Pommerman. We propose a new framework where even a non-expert simulated demonstrator, e.g., planning algorithms such as Monte Carlo tree search with a small number rollouts, can be integrated within asynchronous distributed deep reinforcement learning methods. Compared to a vanilla deep RL algorithm, our proposed methods both learn faster and converge to better policies on a two-player mini version of the Pommerman game.Comment: AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE'19). arXiv admin note: substantial text overlap with arXiv:1904.05759, arXiv:1812.0004

    Effects of curriculum learning on maze exploring DRL agent using Unity ML-Agents

    Get PDF
    As the amount of studies on the usage of machine learning in video games has increased, few of these studies use curriculum learning. This thesis aims to show the benefits that curriculum learning, even in an unoptimized state, can provide to deep reinforcement learning when used with Unity ML-Agents toolkit. This thesis contains two case studies of machine learning agents going through a maze. Both of the case studies have two Agents: one which uses curriculum learning and one which does not. First case study has the Agents use their inbuilt Vector Sensor and in the second case study they use Raycast Perception Sensor. The data that is gathered from the case studies is from the training of two Agent types and the evaluation of the Agents. The results show that adding curriculum learning can increase the stability of training and improve the results of the evaluation. On the other hand, the training and evaluation results are unstable which makes getting definitive results impossible.Videopeleissä hyödynnettävää koneoppimista käsittelevien tutkimusten määrä on jatkanut kasvamista, mutta yhtä koneoppimisen osa-aluetta käytetään näissä tutkimuksissa harvoin: opetussuunnitelman mukaista oppimista. Tämän tutkielman tavoitteena on osoittaa opetussuunnitelman käytön hyötyä syvävahvistusoppimiseen Unity ML-Agents-työkalupakissa, vaikka kyseinen opetussuunnitelma ei ole optimoitu. Tässä tutkielmassa on kaksi tapaustutkimusta, joissa on kaksi koneoppimisagenttia. Näiden agenttien tehtävä on löytää maalialue sokkelosta. Toisella agentilla on opetussuunnitelma käytössä. Ensimmäisessä tapaustutkimuksessa agentit käyttävät ML-Agents-työkalupakin agenteille sisäänrakennettua sensoria nimeltään Vector Sensor ja toisessa tapaustutkimuksessa agentit käyttävät sensoria nimeltään Raycast Perception Sensor. Tapaustutkimuksissa data kerätään agenttien koulutuksesta ja evaluaatiosta. Kerätyt tulokset osoittavat, että opetussuunnitelman mukaisen oppimisen lisääminen voi parantaa agenttien koulutuksen vakautta ja evaluaatiossa saavutettuja tuloksia. Toisaalta molemmissa tapaustutkimuksissa agenttien koulutus on epävakaata, mikä tekee opetussuunnitelman mukaisen oppimisen hyötyjen tarkan määrittelyn mahdottomaksi

    Applied Machine Learning for Games: A Graduate School Course

    Full text link
    The game industry is moving into an era where old-style game engines are being replaced by re-engineered systems with embedded machine learning technologies for the operation, analysis and understanding of game play. In this paper, we describe our machine learning course designed for graduate students interested in applying recent advances of deep learning and reinforcement learning towards gaming. This course serves as a bridge to foster interdisciplinary collaboration among graduate schools and does not require prior experience designing or building games. Graduate students enrolled in this course apply different fields of machine learning techniques such as computer vision, natural language processing, computer graphics, human computer interaction, robotics and data analysis to solve open challenges in gaming. Student projects cover use-cases such as training AI-bots in gaming benchmark environments and competitions, understanding human decision patterns in gaming, and creating intelligent non-playable characters or environments to foster engaging gameplay. Projects demos can help students open doors for an industry career, aim for publications, or lay the foundations of a future product. Our students gained hands-on experience in applying state of the art machine learning techniques to solve real-life problems in gaming.Comment: The Eleventh Symposium on Educational Advances in Artificial Intelligence (EAAI-21

    Aprendizado por reforço assistido por imitação para jogos digitais

    Get PDF
    Reinforcement Learning (RL) and Imitation Learning (IL) are branches of Artificial Intelligence that enable learning through interaction with the environment and through observation of examples, respectively. They have applications in several areas, such as: autonomous vehicles, robot control and games. Games are widely used to test the performance of Reinforcement Learning models, usually using deep neural networks, as they provide a controlled environment capable of exposing the model to a wide variety of problems and contexts. Thus, the present work aims to propose control models for the game Sonic The Hedgehog using Imitation Learning and Deep Reinforcement Learning. In addition, we seek to analyze the performance of imitation models based on adversarial strategies, investigate the impact of imitation on the model’s behavior and performance, and verify whether Imitation Learning can be a viable alternative to creating reward functions. Experiments were carried out comparing different IL methods, in order to verify if it would be able to generate good controllers for the game. Then, the IL methods of behavioral cloning, Adversarial Generative Imitation Learning and Adversarial Inverse Reinforcement Learning were used to start the RL, with the hypothesis that the prior domain knowledge provided by imitation helps the model to achieve better results. The obtained results showed that the IL can be used to generate digital game controllers and that the initialization of the RL step with Imitation Learning can help the model to obtain better performance.O Aprendizado por Reforço (RL) e o Aprendizado por Imitação (IL) são ramos da Inteligência Artificial que possibilitam o aprendizado através da interação com o ambiente e através da observação de exemplos, respectivamente. Eles possuem aplicações em diversas áreas, tais como: veículos autônomos, controle de robôs e jogos. Os jogos são amplamente utilizados para testar o desempenho de modelos de Aprendizado por Reforço, geralmente utilizando redes neurais profundas, pois proporcionam um ambiente controlado capaz de expor o modelo à uma ampla variedade de problemas e contextos. Dessa forma, o presente trabalho tem como objetivo propor modelos de controle para o jogo Sonic The Hedgehog utilizando Aprendizado por Imitação e Aprendizado por Reforço Profundo. Além disso, busca-se analisar o desempenho de modelos de imitação baseados em estratégias adversariais, investigar o impacto da imitação no comportamento e desempenho do modelo, e verificar se o Aprendizado por Imitação pode ser uma alternativa viável à criação de funções de recompensa. Foram realizados experimentos comparando diversos métodos de IL, a fim de verificar se o mesmo seria capaz de gerar bons controladores para o jogo. Em seguida, os métodos de IL de clonagem comportamental, Aprendizado por Imitação Generativo Adversarial e Aprendizado por Reforço Inverso Adversarial foram utilizados para iniciar o RL, com a hipótese de que o conhecimento prévio de domínio disponibilizado pela imitação auxilie o modelo a atingir melhores resultados. Os resultados obtidos mostraram que o IL pode ser utilizado para gerar controladores de jogos digitais e que a inicialização da etapa de RL com o Aprendizado por Imitação pode ajudar o modelo a obter melhor desempenho
    corecore