957 research outputs found

    O-MuZero : abstract planning models Induced by Options on the MuZero Algorithm

    Get PDF
    Training Reinforcement Learning agents that learn both the value function and the envi ronment model can be a very time consuming method, one of the main reasons for that is that these agents learn by actions one step at the time (primitive actions), while humans learn in a more abstract way. In this work we introduce O-MuZero: a method for guiding a Monte-Carlo Tree Search through the use of options (temporally-extended actions). Most related work use options to guide the planning but only acts with primitive actions. Our method, on the other hand, proposes to plan and act with the options used for planning. In order to achieve such result, we modify the Monte-Carlo Tree Search structure, where each node of the tree still represents a state but each edge is an option transition. We ex pect that our method allows the agent to see further into the state space and therefore, have a better quality planning. We show that our method can be combined with state-of-the-art on-line planning algorithms that uses a learned model. We evaluate different variations of our technique on previously established grid-world benchmarks and compare to the MuZero algorithm baseline, which is an algorithm that plans under a learned model and traditionally does not use options. Our method not only helps the agent to learn faster but also yields better results during on-line execution with limited time budgets. We empiri cally show that our method also improves model robustness, which means the ability of the model to play on environments slightly different from the one it trained.Agentes de aprendizado por reforço que aprendem tanto a função de valor quanto o mo delo do ambiente são métodos que podem consumir muito tempo, uma das principais razões para isso é que esses agentes aprendem através de ações com passo de cada vez (ações primitivas), enquanto os humanos aprendem de uma forma mais abstrata. Neste trabalho introduzimos O-MuZero: um método para guiar a busca de árvore Monte-Carlo através do uso de options. A maioria dos trabalhos relacionados utiliza options para guiar o planejamento, mas só joga com ações primitivas, nosso método, por outro lado, se propõe a planejar e jogar com as options usadas no planejamento. Para alcançar esse re sultado, modificamos a estrutura da Árvore de Busca de Monte-Carlo para que cada nodo da árvore ainda represente um estado, mas cada aresta é uma transação de uma option. Esperamos que nosso método permita que o agente veja mais além no espaço do estado e, portanto, faça um planejamento de melhor qualidade. Mostramos que nosso método pode ser combinado com algoritmos de planejamento on-line que jogam com um modelo aprendido. Avaliamos diferentes variações de nossa técnica em benchmarks previamente estabelecidos do ambiente e comparamos com a técnica de base. Nosso método não só ajuda o agente a aprender mais rapidamente, mas também produz melhores resultados durante o jogo. Empiricamente mostramos que o uso de nosso método também melhora a resiliência do modelo, o que significa a capacidade do modelo de jogar em ambientes ligeiramente diferentes daquele em que foi treinado

    Reinforcement learning strategies using Monte-Carlo to solve the blackjack problem

    Get PDF
    Blackjack is a classic casino game in which the player attempts to outsmart the dealer by drawing a combination of cards with face values that add up to just under or equal to 21 but are more incredible than the hand of the dealer he manages to come up with. This study considers a simplified variation of blackjack, which has a dealer and plays no active role after the first two draws. A different game regime will be modeled for everyone to ten multiples of the conventional 52-card deck. Irrespective of the number of standard decks utilized, the game is played as a randomized discrete-time process. For determining the optimum course of action in terms of policy, we teach an agent-a decision maker-to optimize across the decision space of the game, considering the procedure as a finite Markov decision chain. To choose the most effective course of action, we mainly research Monte Carlo-based reinforcement learning approaches and compare them with q-learning, dynamic programming, and temporal difference. The performance of the distinct model-free policy iteration techniques is presented in this study, framing the game as a reinforcement learning problem

    Building Machines That Learn and Think Like People

    Get PDF
    Recent progress in artificial intelligence (AI) has renewed interest in building systems that learn and think like people. Many advances have come from using deep neural networks trained end-to-end in tasks such as object recognition, video games, and board games, achieving performance that equals or even beats humans in some respects. Despite their biological inspiration and performance achievements, these systems differ from human intelligence in crucial ways. We review progress in cognitive science suggesting that truly human-like learning and thinking machines will have to reach beyond current engineering trends in both what they learn, and how they learn it. Specifically, we argue that these machines should (a) build causal models of the world that support explanation and understanding, rather than merely solving pattern recognition problems; (b) ground learning in intuitive theories of physics and psychology, to support and enrich the knowledge that is learned; and (c) harness compositionality and learning-to-learn to rapidly acquire and generalize knowledge to new tasks and situations. We suggest concrete challenges and promising routes towards these goals that can combine the strengths of recent neural network advances with more structured cognitive models.Comment: In press at Behavioral and Brain Sciences. Open call for commentary proposals (until Nov. 22, 2016). https://www.cambridge.org/core/journals/behavioral-and-brain-sciences/information/calls-for-commentary/open-calls-for-commentar

    Application of Retrograde Analysis to Fighting Games

    Get PDF
    With the advent of the fighting game AI competition, there has been recent interest in two-player fighting games. Monte-Carlo Tree-Search approaches currently dominate the competition, but it is unclear if this is the best approach for all fighting games. In this thesis we study the design of two-player fighting games and the consequences of the game design on the types of AI that should be used for playing the game, as well as formally define the state space that fighting games are based on. Additionally, we also characterize how AI can solve the game given a simultaneous action game model, to understand the characteristics of the solved AI and the impact it has on game design

    Game State and Action Abstracting Monte Carlo Tree Search for General Strategy Game-Playing

    Get PDF
    When implementing intelligent agents for strategy games, we observe that search-based methods struggle with the complexity of such games. To tackle this problem, we propose a new variant of Monte Carlo Tree Search which can incorporate action and game state abstractions. Focusing on the latter, we developed a game state encoding for turn-based strategy games that allows for a flexible abstraction. Using an optimization procedure, we optimize the agent's action and game state abstraction to maximize its performance against a rule-based agent. Furthermore, we compare different combinations of abstractions and their impact on the agent's performance based on the Kill the King game of the Stratega framework. Our results show that action abstractions have improved the performance of our agent considerably. Contrary, game state abstractions have not shown much impact. While these results may be limited to the tested game, they are in line with previous research on abstractions of simple Markov Decision Processes. The higher complexity of strategy games may require more intricate methods, such as hierarchical or time-based abstractions, to further improve the agent's performance

    Discovering optimal strategy in tactical combat scenarios through the evolution of behaviour trees

    Get PDF
    In this paper we address the problem of automatically discovering optimal tactics in a combat scenario in which two opposing sides control a number of fighting units. Our approach is based on the evolution of behaviour trees, combined with simulation-based evaluation of solutions to drive the evolution. Our behaviour trees use a small set of possible actions that can be assigned to a combat unit, along with standard behaviour tree constructs and a novel approach for selecting which action from the tree is performed. A set of test scenarios was designed for which an optimal strategy is known from the literature. These scenarios were used to explore and evaluate our approach. The results indicate that it is possible, from the small set of possible unit actions, for a complex strategy to emerge through evolution. Combat units with different capabilities were observed exhibiting coordinated team work and exploiting aspects of the environment
    corecore