270 research outputs found

    Macro action selection with deep reinforcement learning in StarCraft

    Full text link
    StarCraft (SC) is one of the most popular and successful Real Time Strategy (RTS) games. In recent years, SC is also widely accepted as a challenging testbed for AI research because of its enormous state space, partially observed information, multi-agent collaboration, and so on. With the help of annual AIIDE and CIG competitions, a growing number of SC bots are proposed and continuously improved. However, a large gap remains between the top-level bot and the professional human player. One vital reason is that current SC bots mainly rely on predefined rules to select macro actions during their games. These rules are not scalable and efficient enough to cope with the enormous yet partially observed state space in the game. In this paper, we propose a deep reinforcement learning (DRL) framework to improve the selection of macro actions. Our framework is based on the combination of the Ape-X DQN and the Long-Short-Term-Memory (LSTM). We use this framework to build our bot, named as LastOrder. Our evaluation, based on training against all bots from the AIIDE 2017 StarCraft AI competition set, shows that LastOrder achieves an 83% winning rate, outperforming 26 bots in total 28 entrants

    Online Build-Order Optimization for Real-Time Strategy Agents Using Multi-Objective Evolutionary Algorithms

    Get PDF
    The investigation introduces a novel approach for online build-order optimization in real-time strategy (RTS) games. The goal of our research is to develop an artificial intelligence (AI) RTS planning agent for military critical decision- making education with the ability to perform at an expert human level, as well as to assess a players critical decision- making ability or skill-level. Build-order optimization is modeled as a multi-objective problem (MOP), and solutions are generated utilizing a multi-objective evolutionary algorithm (MOEA) that provides a set of good build-orders to a RTS planning agent. We de ne three research objectives: (1) Design, implement and validate a capability to determine the skill-level of a RTS player. (2) Design, implement and validate a strategic planning tool that produces near expert level build-orders which are an ordered sequence of actions a player can issue to achieve a goal, and (3) Integrate the strategic planning tool into our existing RTS agent framework and an RTS game engine. The skill-level metric we selected provides an original and needed method of evaluating a RTS players skill-level during game play. This metric is a high-level description of how quickly a player executes a strategy versus known players executing the same strategy. Our strategic planning tool combines a game simulator and an MOEA to produce a set of diverse and good build-orders for an RTS agent. Through the integration of case-base reasoning (CBR), planning goals are derived and expert build- orders are injected into a MOEA population. The MOEA then produces a diverse and approximate Pareto front that is integrated into our AI RTS agent framework. Thus, the planning tool provides an innovative online approach for strategic planning in RTS games. Experimentation via the Spring Engine Balanced Annihilation game reveals that the strategic planner is able to discover build-orders that are better than an expert scripted agent and thus achieve faster strategy execution times

    Macro action selection with deep reinforcement learning in StarCraft

    Get PDF
    StarCraft (SC) is one of the most popular and successful Real Time Strategy (RTS) games. In recent years, SC is also considered as a testbed for AI research, due to its enormous state space, hidden information, multi-agent collaboration and so on. Thanks to the annual AIIDE and CIG competitions, a growing number of bots are proposed and being continuously improved. However, a big gap still remains between the top bot and the professional human players. One vital reason is that current bots mainly rely on predefined rules to perform macro actions. These rules are not scalable and efficient enough to cope with the large but partially observed macro state space in SC. In this paper, we propose a DRL based framework to do macro action selection. Our framework combines the reinforcement learning approach Ape-X DQN with Long-Short-Term-Memory (LSTM) to improve the macro action selection in bot. We evaluate our bot, named as LastOrder, on the AIIDE 2017 StarCraft AI competition bots set. Our bot achieves overall 83% win-rate, outperforming 26 bots in total 28 entrants

    Redes neuronales que expresan múltiples estrategias en el videojuego StarCraft 2.

    Get PDF
    ilustracionesUsing neural networks and supervised learning, we have created models capable of solving problems at a superhuman level. Nevertheless, this training process results in models that learn policies that average the plethora of behaviors usually found in datasets. In this thesis we present and study the Behavioral Repetoires Imitation Learning (BRIL) technique. In BRIL, the user designs a behavior space, the user then projects this behavior space into low coordinates and uses these coordinates as input to the model. Upon deployment, the user can adjust the model to express a behavior by specifying fixed coordinates for these inputs. The main research question ponders on the relationship between the Dimension Reduction algorithm and how much the trained models are able to replicate behaviors. We study three different Dimensionality Reduction algorithms: Principal Component Analysis (PCA), Isometric Feature Mapping (Isomap) and Uniform Manifold Approximation and Projection (UMAP); we design and embed a behavior space in the video game StarCraft 2, we train different models for each embedding and we test the ability of each model to express multiple strategies. Results show that with BRIL we are able to train models that are able to express the multiple behaviors present in the dataset. The geometric structure these methods preserve induce different separations of behaviors, and these separations are reflected in the models' conducts. (Tomado de la fuente)Usando redes neuronales y aprendizaje supervisado, hemos creado modelos capaces de solucionar problemas a nivel súperhumano. Sin embargo, el proceso de entrenamiento de estos modelos es tal que el resultado es una política que promedia todos los diferentes comportamientos presentes en el conjunto de datos. En esta tesis presentamos y estudiamos la técnica Aprendizaje por Imitación de Repertorios de Comportamiento (BRIL), la cual permite entrenar modelos que expresan múltiples comportamientos de forma ajustable. En BRIL, el usuario diseña un espacio de comportamientos, lo proyecta a bajas dimensiones y usa las coordenadas resultantes como entradas del modelo. Para poder expresar cierto comportamiento a la hora de desplegar la red, basta con fijar estas entradas a las coordenadas del respectivo comportamiento. La pregunta principal que investigamos es la relación entre el algoritmo de reducción de dimensionalidad y la capacidad de los modelos entrenados para replicar y expresar las estrategias representadas. Estudiamos tres algoritmos diferentes de reducción de dimensionalidad: Análisis de Componentes Principales (PCA), Mapeo de Características Isométrico (Isomap) y Aproximación y Proyección de Manifolds Uniformes (UMAP); diseñamos y proyectamos un espacio de comportamientos en el videojuego StarCraft 2, entrenamos diferentes modelos para cada embebimiento y probamos la capacidad de cada modelo de expresar múltiples estrategias. Los resultados muestran que, usando BRIL, logramos entrenar modelos que pueden expresar los múltiples comportamientos presentes en el conjunto de datos. La estructura geométrica preservada por cada método de reducción induce diferentes separaciones de los comportamientos, y estas separaciones se ven reflejadas en las conductas de los modelos. (Tomado de la fuente)Maestrí

    A Real-time Strategy Agent Framework and Strategy Classifier for Computer Generated Forces

    Get PDF
    This research effort is concerned with the advancement of computer generated forces AI for Department of Defense (DoD) military training and education. The vision of this work is agents capable of perceiving and intelligently responding to opponent strategies in real-time. Our research goal is to lay the foundations for such an agent. Six research objectives are defined: 1) Formulate a strategy definition schema effective in defining a range of RTS strategies. 2) Create eight strategy definitions via the schema. 3) Design a real-time agent framework that plays the game according to the given strategy definition. 4) Generate an RTS data set. 5) Create an accurate and fast executing strategy classifier. 6) Find the best counterstrategies for each strategy definition. The agent framework is used to play the eight strategies against each other and generate a data set of game observations. To classify the data, we first perform feature reduction using principal component analysis or linear discriminant analysis. Two classifier techniques are employed, k-means clustering with k-nearest neighbor and support vector machine. The resulting classifier is 94.1% accurate with an average classification execution speed of 7.14 us. Our research effort has successfully laid the foundations for a dynamic strategy agent

    Self Monitoring Goal Driven Autonomy Agents

    Get PDF
    The growing abundance of autonomous systems is driving the need for robust performance. Most current systems are not fully autonomous and often fail when placed in real environments. Via self-monitoring, agents can identify when their own, or externally given, boundaries are violated, thereby increasing their performance and reliability. Specifically, self-monitoring is the identification of unexpected situations that either (1) prohibit the agent from reaching its goal(s) or (2) result in the agent acting outside of its boundaries. Increasingly complex and open environments warrant the use of such robust autonomy (e.g., self-driving cars, delivery drones, and all types of future digital and physical assistants). The techniques presented herein advance the current state of the art in self-monitoring, demonstrating improved performance in a variety of challenging domains. In the aforementioned domains, there is an inability to plan for all possible situations. In many cases all aspects of a domain are not known beforehand, and, even if they were, the cost of encoding them is high. Self-monitoring agents are able to identify and then respond to previously unexpected situations, or never-before-encountered situations. When dealing with unknown situations, one must start with what is expected behavior and use that to derive unexpected behavior. The representation of expectations will vary among domains; in a real-time strategy game like Starcraft, it could be logically inferred concepts; in a mars rover domain, it could be an accumulation of actions\u27 effects. Nonetheless, explicit expectations are necessary to identify the unexpected. This thesis lays the foundation for self-monitoring in goal driven autonomy agents in both rich and expressive domains and in partially observable domains. We introduce multiple techniques for handling such environments. We show how inferred expectations are needed to enable high level planning in real-time strategy games. We show how a hierarchical structure of Goal-driven Autonomy (GDA) enables agents to operate within large state spaces. Within Hierarchical Task Network planning, we show how informed expectations identify states that are likely to prevent an agent from reaching its goals in dynamic domains. Finally, we give a model of expectations for self-monitoring at the meta-cognitive level, and empirical results of agents equipped with and without metacognitive expectations
    • …
    corecore