Search CORE

4,814 research outputs found

A Generalized Training Approach for Multiagent Learning

Author: Graepel Thore
Heess Nicolas
Hennes Daniel
Hughes Edward
Lanctot Marc
Lever Guy
Liu Siqi
Marris Luke
Muller Paul
Munos Remi
Omidshafiei Shayegan
Perolat Julien
Rowland Mark
Tuyls Karl
Wang Zhe
Publication venue
Publication date: 01/01/2020
Field of study

This paper investigates a population-based training regime based on game-theoretic principles called Policy-Spaced Response Oracles (PSRO). PSRO is general in the sense that it (1) encompasses well-known algorithms such as fictitious play and double oracle as special cases, and (2) in principle applies to general-sum, many-player games. Despite this, prior studies of PSRO have been focused on two-player zero-sum games, a regime wherein Nash equilibria are tractably computable. In moving from two-player zero-sum games to more general settings, computation of Nash equilibria quickly becomes infeasible. Here, we extend the theoretical underpinnings of PSRO by considering an alternative solution concept,

\alpha

-Rank, which is unique (thus faces no equilibrium selection issues, unlike Nash) and applies readily to general-sum, many-player settings. We establish convergence guarantees in several games classes, and identify links between Nash equilibria and

\alpha

-Rank. We demonstrate the competitive performance of

\alpha

-Rank-based PSRO against an exact Nash solver-based PSRO in 2-player Kuhn and Leduc Poker. We then go beyond the reach of prior PSRO applications by considering 3- to 5-player poker games, yielding instances where

\alpha

-Rank achieves faster convergence than approximate Nash solvers, thus establishing it as a favorable general games solver. We also carry out an initial empirical validation in MuJoCo soccer, illustrating the feasibility of the proposed approach in another complex domain

arXiv.org e-Print Archive

UCL Discovery

Socially Aware Motion Planning with Deep Reinforcement Learning

Author: Chen Yu Fan
Everett Michael
How Jonathan P.
Liu Miao
Publication venue
Publication date: 21/03/2018
Field of study

For robotic vehicles to navigate safely and efficiently in pedestrian-rich environments, it is important to model subtle human behaviors and navigation rules (e.g., passing on the right). However, while instinctive to humans, socially compliant navigation is still difficult to quantify due to the stochasticity in people's behaviors. Existing works are mostly focused on using feature-matching techniques to describe and imitate human paths, but often do not generalize well since the feature values can vary from person to person, and even run to run. This work notes that while it is challenging to directly specify the details of what to do (precise mechanisms of human navigation), it is straightforward to specify what not to do (violations of social norms). Specifically, using deep reinforcement learning, this work develops a time-efficient navigation policy that respects common social norms. The proposed method is shown to enable fully autonomous navigation of a robotic vehicle moving at human walking speed in an environment with many pedestrians.Comment: 8 page

arXiv.org e-Print Archive

DSpace@MIT

Crossref