16 research outputs found
The StarCraft Multi-Agent Challenge
In the last few years, deep multi-agent reinforcement learning (RL) has
become a highly active area of research. A particularly challenging class of
problems in this area is partially observable, cooperative, multi-agent
learning, in which teams of agents must learn to coordinate their behaviour
while conditioning only on their private observations. This is an attractive
research area since such problems are relevant to a large number of real-world
systems and are also more amenable to evaluation than general-sum problems.
Standardised environments such as the ALE and MuJoCo have allowed single-agent
RL to move beyond toy domains, such as grid worlds. However, there is no
comparable benchmark for cooperative multi-agent RL. As a result, most papers
in this field use one-off toy problems, making it difficult to measure real
progress. In this paper, we propose the StarCraft Multi-Agent Challenge (SMAC)
as a benchmark problem to fill this gap. SMAC is based on the popular real-time
strategy game StarCraft II and focuses on micromanagement challenges where each
unit is controlled by an independent agent that must act based on local
observations. We offer a diverse set of challenge maps and recommendations for
best practices in benchmarking and evaluations. We also open-source a deep
multi-agent RL learning framework including state-of-the-art algorithms. We
believe that SMAC can provide a standard benchmark environment for years to
come. Videos of our best agents for several SMAC scenarios are available at:
https://youtu.be/VZ7zmQ_obZ0
Habitat 3.0: A Co-Habitat for Humans, Avatars and Robots
We present Habitat 3.0: a simulation platform for studying collaborative
human-robot tasks in home environments. Habitat 3.0 offers contributions across
three dimensions: (1) Accurate humanoid simulation: addressing challenges in
modeling complex deformable bodies and diversity in appearance and motion, all
while ensuring high simulation speed. (2) Human-in-the-loop infrastructure:
enabling real human interaction with simulated robots via mouse/keyboard or a
VR interface, facilitating evaluation of robot policies with human input. (3)
Collaborative tasks: studying two collaborative tasks, Social Navigation and
Social Rearrangement. Social Navigation investigates a robot's ability to
locate and follow humanoid avatars in unseen environments, whereas Social
Rearrangement addresses collaboration between a humanoid and robot while
rearranging a scene. These contributions allow us to study end-to-end learned
and heuristic baselines for human-robot collaboration in-depth, as well as
evaluate them with humans in the loop. Our experiments demonstrate that learned
robot policies lead to efficient task completion when collaborating with unseen
humanoid agents and human partners that might exhibit behaviors that the robot
has not seen before. Additionally, we observe emergent behaviors during
collaborative task execution, such as the robot yielding space when obstructing
a humanoid agent, thereby allowing the effective completion of the task by the
humanoid agent. Furthermore, our experiments using the human-in-the-loop tool
demonstrate that our automated evaluation with humanoids can provide an
indication of the relative ordering of different policies when evaluated with
real human collaborators. Habitat 3.0 unlocks interesting new features in
simulators for Embodied AI, and we hope it paves the way for a new frontier of
embodied human-AI interaction capabilities.Comment: Project page: http://aihabitat.org/habitat
Training Intelligent Red Team Agents Via Reinforcement Deep Learning
NPS NRP Technical ReportWargames are an essential tool for education, training and formulation of strategy. They are especially important in the evaluation of threats from, and strategies against, trained adversaries who present significant risk to friendly forces. We propose to develop a wargame adversary trained to defeat the current strategy of friendly forces, thereby allowing the evaluation of alternate strategies against an intelligent, simulated opponent. We will investigate the use of deep neural network (DNN) algorithms to solve a constrained stochastic reward-collecting path problem. Agents from a friendly (blue) team and an adversarial (red) team will be placed within a discrete environment. The blue team will be challenged to obtain a reward by achieving a fixed goal using a pre-determined strategy. Then, reinforcement learning will be used to train the red team to overcome the blue team's current strategy. Having thus trained a competent red team, the blue team's strategy can be altered to evaluate the efficacy of new strategies. This research will seek to evaluate the ability of different DNN algorithms to train the red team against various blue team strategies, in terms of both efficacy and efficiency, and the resiliency of the trained red team to subsequent changes in blue team strategy. We anticipate the results of this research to be summarized in a research poster and executive summary, in addition to a presentation and full technical report deliverable to the Topic Sponsor.Marine Corps Systems Command (MARCORSYSCOM)Marine Corps Systems Command (MARCORSYSCOM)This research is supported by funding from the Naval Postgraduate School, Naval Research Program (PE 0605853N/2098). https://nps.edu/nrpChief of Naval Operations (CNO)Approved for public release. Distribution is unlimited.
Accelerating learning in multiagent domains through experience sharing
Dissertação (mestrado)—Universidade de Brasília, Instituto de Ciências Exatas, Departamento de Ciência da Computação, 2019.Essa dissertação contribui para o crescente campo de inteligência artificial e aprendizado
de máquina. Aprendizado é um componente essencial do comportamento humano, a faculdade
por trás da nossa habilidade de se adaptar. E essa característica única que diferencia
seres humanos de outras espécies, e nos permitiu perserverar e dominar o mundo como
nos conhecemos. Através de algoritmos de aprendizado, nós buscamos imbuir agentes
artificiais com essa mesma capacidade, para que eles possam aprender e se adaptar interagindo
com o ambiente, conseguindo desta forma aumentar seu potencial de atingir seus
objetivos.
Nesse trabalho, nós buscamos resolver o problema de como múltiplos agentes cooperativos
aprendendo concomitantemente podem se beneficar de conhecimento compartilhado
entre eles. A habilidade de compartilhar conhecimento adquirido, seja instantaneamente
ou através de gerações, é peça chave para a nossa evolução. Segue que o compartilhamento
de conhecimento entre agentes autônomos pode ser a chave para acelerar conhecimento
em sistemas multiagentes cooperativos. Baseado nesse raciocínio, neste trabalho investigamos
métodos de compartilhamento de conhecimento que pode efetivamente levar a
uma aceleração no aprendizado.
A pesquisa é focada na abordagem de transferência de conhecimento através do compartilhamento
de experiências. O modelo MultiAgent Cooperative Experience Sharing
(MACES) define uma arquitetura que permite troca de experiências entre agentes cooperativos
aprendendo concomitantemente. Neste modelo, investigamos diferentes métodos
de compartilhamento de experiências que podem levar a aceleração do aprendizado.
O modelo é validado em dois problemas diferentes de aprendizado de reforço, um
problema de controle clássico e um de navegação. Os resultados apresentados mostram
que o MACES é capaz de reduzir em mais da metade o número de episódios necessários
para completar uma tarefa através da cooperação de apenas dois agentes, comparado a
agentes não cooperativos. O modelo é aplicável a agentes que implementam métodos de
aprendizado de reforço profundo.This dissertation is a contribution to the burgeoning field of artificial intelligence and
machine learning. Learning is a core component of human behaviour, the faculty behind
our ability to adapt. It is the single characteristic that differentiate humans from other
species, and has allowed us to persevere and dominate the world as we know. Through
learning algorithms, we seek to imbue artificial agents with the same capacity, so they
can as well learn and adapt by interacting with the environment, thus enhancing their
potential to achieve their goals.
In this work, we address the hard problem of how multiple cooperative agents learning
concurrently to achieve a goal can benefit from sharing knowledge with each other. Key
to our evolution is our ability to share learned knowledge with each other instantaneously
and through generations. It follows that knowledge sharing between autonomous and
independent agents could as well become the key to accelerate learning in cooperative
multiagent settings. Pursuing this line of inquiry, we investigate methods of knowledge
sharing that can effectively lead to faster learning.
We focus on the approach of transferring knowledge by experience sharing. The proposed
MultiAgent Cooperative Experience Sharing (MACES) model defines an architecture
that allows experience sharing between concurrently learning cooperative agents.
Within MACES, we investigate different methods of experience sharing that can lead to
accelerated learning.
The proposed model is validated in two different reinforcement learning settings, a
classical control and a navigation problem. The results shows that MACES is able to
reduce in over a half the number of episodes required to complete a task through cooperation
of only two agents, compared to a single agent baseline. The model is applicable
to deep reinforcement learning agents