Search CORE

775 research outputs found

Overview of deep reinforcement learning in partially observable multi-agent environment of competitive online video games

Author: Louhio Jaana
Publication venue: Helsingfors universitet
Publication date: 01/01/2020
Field of study

In the late 2010’s classical games of Go, Chess and Shogi have been considered ’solved’ by deep reinforcement learning AI agents. Competitive online video games may offer a new, more challenging environment for deep reinforcement learning and serve as a stepping stone in a path to real world applications. This thesis aims to give a short introduction to the concepts of reinforcement learning, deep networks and deep reinforcement learning. Then the thesis proceeds to look into few popular competitive online video games and to the general problems of AI development in these types of games. Deep reinforcement learning algorithms, techniques and architectures used in the development of highly competitive AI agents in Starcraft 2, Dota 2 and Quake 3 are overviewed. Finally, the results are looked into and discussed

Helsingin yliopiston digitaalinen arkisto

Counterfactual Multi-Agent Policy Gradients

Author: Afouras Triantafyllos
Farquhar Gregory
Foerster Jakob
Nardelli Nantas
Whiteson Shimon
Publication venue
Publication date: 14/12/2017
Field of study

Cooperative multi-agent systems can be naturally used to model many real world problems, such as network packet routing and the coordination of autonomous vehicles. There is a great need for new reinforcement learning methods that can efficiently learn decentralised policies for such systems. To this end, we propose a new multi-agent actor-critic method called counterfactual multi-agent (COMA) policy gradients. COMA uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents' policies. In addition, to address the challenges of multi-agent credit assignment, it uses a counterfactual baseline that marginalises out a single agent's action, while keeping the other agents' actions fixed. COMA also uses a critic representation that allows the counterfactual baseline to be computed efficiently in a single forward pass. We evaluate COMA in the testbed of StarCraft unit micromanagement, using a decentralised variant with significant partial observability. COMA significantly improves average performance over other multi-agent actor-critic methods in this setting, and the best performing agents are competitive with state-of-the-art centralised controllers that get access to the full state

arXiv.org e-Print Archive

Oxford University Research Archive