2,573 research outputs found
Multiagent Bidirectionally-Coordinated Nets: Emergence of Human-level Coordination in Learning to Play StarCraft Combat Games
Many artificial intelligence (AI) applications often require multiple
intelligent agents to work in a collaborative effort. Efficient learning for
intra-agent communication and coordination is an indispensable step towards
general AI. In this paper, we take StarCraft combat game as a case study, where
the task is to coordinate multiple agents as a team to defeat their enemies. To
maintain a scalable yet effective communication protocol, we introduce a
Multiagent Bidirectionally-Coordinated Network (BiCNet ['bIknet]) with a
vectorised extension of actor-critic formulation. We show that BiCNet can
handle different types of combats with arbitrary numbers of AI agents for both
sides. Our analysis demonstrates that without any supervisions such as human
demonstrations or labelled data, BiCNet could learn various types of advanced
coordination strategies that have been commonly used by experienced game
players. In our experiments, we evaluate our approach against multiple
baselines under different scenarios; it shows state-of-the-art performance, and
possesses potential values for large-scale real-world applications.Comment: 10 pages, 10 figures. Previously as title: "Multiagent
Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat
Games", Mar 201
Guided Deep Reinforcement Learning for Swarm Systems
In this paper, we investigate how to learn to control a group of cooperative
agents with limited sensing capabilities such as robot swarms. The agents have
only very basic sensor capabilities, yet in a group they can accomplish
sophisticated tasks, such as distributed assembly or search and rescue tasks.
Learning a policy for a group of agents is difficult due to distributed partial
observability of the state. Here, we follow a guided approach where a critic
has central access to the global state during learning, which simplifies the
policy evaluation problem from a reinforcement learning point of view. For
example, we can get the positions of all robots of the swarm using a camera
image of a scene. This camera image is only available to the critic and not to
the control policies of the robots. We follow an actor-critic approach, where
the actors base their decisions only on locally sensed information. In
contrast, the critic is learned based on the true global state. Our algorithm
uses deep reinforcement learning to approximate both the Q-function and the
policy. The performance of the algorithm is evaluated on two tasks with simple
simulated 2D agents: 1) finding and maintaining a certain distance to each
others and 2) locating a target.Comment: 15 pages, 8 figures, accepted at the AAMAS 2017 Autonomous Robots and
Multirobot Systems (ARMS) Worksho
- …