4,814 research outputs found
A Generalized Training Approach for Multiagent Learning
This paper investigates a population-based training regime based on
game-theoretic principles called Policy-Spaced Response Oracles (PSRO). PSRO is
general in the sense that it (1) encompasses well-known algorithms such as
fictitious play and double oracle as special cases, and (2) in principle
applies to general-sum, many-player games. Despite this, prior studies of PSRO
have been focused on two-player zero-sum games, a regime wherein Nash
equilibria are tractably computable. In moving from two-player zero-sum games
to more general settings, computation of Nash equilibria quickly becomes
infeasible. Here, we extend the theoretical underpinnings of PSRO by
considering an alternative solution concept, -Rank, which is unique
(thus faces no equilibrium selection issues, unlike Nash) and applies readily
to general-sum, many-player settings. We establish convergence guarantees in
several games classes, and identify links between Nash equilibria and
-Rank. We demonstrate the competitive performance of
-Rank-based PSRO against an exact Nash solver-based PSRO in 2-player
Kuhn and Leduc Poker. We then go beyond the reach of prior PSRO applications by
considering 3- to 5-player poker games, yielding instances where -Rank
achieves faster convergence than approximate Nash solvers, thus establishing it
as a favorable general games solver. We also carry out an initial empirical
validation in MuJoCo soccer, illustrating the feasibility of the proposed
approach in another complex domain
Socially Aware Motion Planning with Deep Reinforcement Learning
For robotic vehicles to navigate safely and efficiently in pedestrian-rich
environments, it is important to model subtle human behaviors and navigation
rules (e.g., passing on the right). However, while instinctive to humans,
socially compliant navigation is still difficult to quantify due to the
stochasticity in people's behaviors. Existing works are mostly focused on using
feature-matching techniques to describe and imitate human paths, but often do
not generalize well since the feature values can vary from person to person,
and even run to run. This work notes that while it is challenging to directly
specify the details of what to do (precise mechanisms of human navigation), it
is straightforward to specify what not to do (violations of social norms).
Specifically, using deep reinforcement learning, this work develops a
time-efficient navigation policy that respects common social norms. The
proposed method is shown to enable fully autonomous navigation of a robotic
vehicle moving at human walking speed in an environment with many pedestrians.Comment: 8 page
- …