775 research outputs found
Overview of deep reinforcement learning in partially observable multi-agent environment of competitive online video games
In the late 2010’s classical games of Go, Chess and Shogi have been considered ’solved’ by deep
reinforcement learning AI agents. Competitive online video games may offer a new, more challenging environment for deep reinforcement learning and serve as a stepping stone in a path to real
world applications. This thesis aims to give a short introduction to the concepts of reinforcement
learning, deep networks and deep reinforcement learning. Then the thesis proceeds to look into few
popular competitive online video games and to the general problems of AI development in these
types of games. Deep reinforcement learning algorithms, techniques and architectures used in the
development of highly competitive AI agents in Starcraft 2, Dota 2 and Quake 3 are overviewed.
Finally, the results are looked into and discussed
Counterfactual Multi-Agent Policy Gradients
Cooperative multi-agent systems can be naturally used to model many real
world problems, such as network packet routing and the coordination of
autonomous vehicles. There is a great need for new reinforcement learning
methods that can efficiently learn decentralised policies for such systems. To
this end, we propose a new multi-agent actor-critic method called
counterfactual multi-agent (COMA) policy gradients. COMA uses a centralised
critic to estimate the Q-function and decentralised actors to optimise the
agents' policies. In addition, to address the challenges of multi-agent credit
assignment, it uses a counterfactual baseline that marginalises out a single
agent's action, while keeping the other agents' actions fixed. COMA also uses a
critic representation that allows the counterfactual baseline to be computed
efficiently in a single forward pass. We evaluate COMA in the testbed of
StarCraft unit micromanagement, using a decentralised variant with significant
partial observability. COMA significantly improves average performance over
other multi-agent actor-critic methods in this setting, and the best performing
agents are competitive with state-of-the-art centralised controllers that get
access to the full state
- …