775 research outputs found

    Overview of deep reinforcement learning in partially observable multi-agent environment of competitive online video games

    Get PDF
    In the late 2010’s classical games of Go, Chess and Shogi have been considered ’solved’ by deep reinforcement learning AI agents. Competitive online video games may offer a new, more challenging environment for deep reinforcement learning and serve as a stepping stone in a path to real world applications. This thesis aims to give a short introduction to the concepts of reinforcement learning, deep networks and deep reinforcement learning. Then the thesis proceeds to look into few popular competitive online video games and to the general problems of AI development in these types of games. Deep reinforcement learning algorithms, techniques and architectures used in the development of highly competitive AI agents in Starcraft 2, Dota 2 and Quake 3 are overviewed. Finally, the results are looked into and discussed

    Counterfactual Multi-Agent Policy Gradients

    Full text link
    Cooperative multi-agent systems can be naturally used to model many real world problems, such as network packet routing and the coordination of autonomous vehicles. There is a great need for new reinforcement learning methods that can efficiently learn decentralised policies for such systems. To this end, we propose a new multi-agent actor-critic method called counterfactual multi-agent (COMA) policy gradients. COMA uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents' policies. In addition, to address the challenges of multi-agent credit assignment, it uses a counterfactual baseline that marginalises out a single agent's action, while keeping the other agents' actions fixed. COMA also uses a critic representation that allows the counterfactual baseline to be computed efficiently in a single forward pass. We evaluate COMA in the testbed of StarCraft unit micromanagement, using a decentralised variant with significant partial observability. COMA significantly improves average performance over other multi-agent actor-critic methods in this setting, and the best performing agents are competitive with state-of-the-art centralised controllers that get access to the full state
    • …
    corecore