5 research outputs found
The StarCraft Multi-Agent Challenge
In the last few years, deep multi-agent reinforcement learning (RL) has
become a highly active area of research. A particularly challenging class of
problems in this area is partially observable, cooperative, multi-agent
learning, in which teams of agents must learn to coordinate their behaviour
while conditioning only on their private observations. This is an attractive
research area since such problems are relevant to a large number of real-world
systems and are also more amenable to evaluation than general-sum problems.
Standardised environments such as the ALE and MuJoCo have allowed single-agent
RL to move beyond toy domains, such as grid worlds. However, there is no
comparable benchmark for cooperative multi-agent RL. As a result, most papers
in this field use one-off toy problems, making it difficult to measure real
progress. In this paper, we propose the StarCraft Multi-Agent Challenge (SMAC)
as a benchmark problem to fill this gap. SMAC is based on the popular real-time
strategy game StarCraft II and focuses on micromanagement challenges where each
unit is controlled by an independent agent that must act based on local
observations. We offer a diverse set of challenge maps and recommendations for
best practices in benchmarking and evaluations. We also open-source a deep
multi-agent RL learning framework including state-of-the-art algorithms. We
believe that SMAC can provide a standard benchmark environment for years to
come. Videos of our best agents for several SMAC scenarios are available at:
https://youtu.be/VZ7zmQ_obZ0
Component-action deep Q-learning for real-time strategy game AI
Real-time Strategy (RTS) games provide a challenging environment for AI research, due
to their large state and action spaces, hidden information, and real-time gameplay. The
RTS game StarCraft II has become a new test-bed for deep reinforcement learning (RL)
systems using the StarCraft II Learning Environment (SC2LE). Recently the full game of
StarCraft II has been approached with a complex multi-agent RL system only possible
with extremely large financial investments. In this thesis we will describe existing work
in RTS AI and motivate our work adapting the deep Q-learning (DQN) RL algorithm to
accommodate the multi-dimensional action-space of the SC2LE. We then present the results
of our experiments using custom combat scenarios. First, we compare methods for calculating
DQN training loss with action components. Second, we show that policies trained with
component-action DQN for five hours perform comparably to scripted policies in smaller
scenarios and outperform them in larger scenarios. Third, we explore several ways to transfer
policies between scenarios, and show that it is a viable method to reduce training time. We
show that policies trained on scenarios with fewer units can be applied to larger scenarios
and to scenarios with different unit types with only a small loss in performance
Multi-Agent Reinforcement Learning in Large Complex Environments
Multi-agent reinforcement learning (MARL) has seen much success in the past decade. However, these methods are yet to find wide application in large-scale real world problems due to two important reasons. First, MARL algorithms have poor sample efficiency, where many data samples need to be obtained through interactions with the environment to learn meaningful policies, even in small environments. Second, MARL algorithms are not scalable to environments with many agents since, typically, these algorithms are exponential in the number of agents in the environment. This dissertation aims to address both of these challenges with the goal of making MARL applicable to a variety of real world environments.
Towards improving sample efficiency, an important observation is that many real world environments already, in practice, deploy sub-optimal or heuristic approaches for generating policies. A useful possibility that arises is how to best use such approaches as advisors to help improve reinforcement learning in multi-agent domains. In this dissertation, we provide a principled framework for incorporating action recommendations from online sub-optimal advisors in multi-agent settings. To this end, we propose a general model for learning from external advisors in MARL and show that desirable theoretical properties such as convergence to a unique solution concept, and reasonable finite sample complexity bounds exist, under a set of common assumptions. Furthermore, extensive experiments illustrate that these algorithms: can be used in a variety of environments, have performances that compare favourably to other related baselines, can scale to large state-action spaces, and are robust to poor advice from advisors.
Towards scaling MARL, we explore the use of mean field theory. Mean field theory provides an effective way of scaling multi-agent reinforcement learning algorithms to environments with many agents, where other agents can be abstracted by a virtual mean agent. Prior work has used mean field theory in MARL, however, they suffer from several stringent assumptions such as requiring fully homogeneous agents, full observability of the environment, and centralized learning settings, that prevent their wide application in practical environments. In this dissertation, we extend mean field methods to environments having heterogeneous agents, and partially observable settings. Further, we extend mean field methods to include decentralized approaches. We provide novel mean field based MARL algorithms that outperform previous methods on a set of large games with many agents. Theoretically, we provide bounds on the information loss experienced as a result of using the mean field and further provide fixed point guarantees for Q-learning-based algorithms in each of these environments.
Subsequently, we combine our work in mean field learning and learning from advisors to show that we can achieve powerful MARL algorithms that are more suitable for real world environments as compared to prior approaches. This method uses the recently introduced attention mechanism to perform per-agent modelling of others in the locality, in addition to using the mean field for global responses. Notably, in this dissertation, we show applications in several real world multi-agent environments such as the Ising model, the ride-pool matching problem, and the massively multi-player online (MMO) game setting (which is currently a multi-billion dollar market)