2,373 research outputs found
Partially Observable Multi-agent RL with (Quasi-)Efficiency: The Blessing of Information Sharing
We study provable multi-agent reinforcement learning (MARL) in the general
framework of partially observable stochastic games (POSGs). To circumvent the
known hardness results and the use of computationally intractable oracles, we
advocate leveraging the potential \emph{information-sharing} among agents, a
common practice in empirical MARL, and a standard model for multi-agent control
systems with communications. We first establish several computation complexity
results to justify the necessity of information-sharing, as well as the
observability assumption that has enabled quasi-efficient single-agent RL with
partial observations, for computational efficiency in solving POSGs. We then
propose to further \emph{approximate} the shared common information to
construct an {approximate model} of the POSG, in which planning an approximate
equilibrium (in terms of solving the original POSG) can be quasi-efficient,
i.e., of quasi-polynomial-time, under the aforementioned assumptions.
Furthermore, we develop a partially observable MARL algorithm that is both
statistically and computationally quasi-efficient. We hope our study may open
up the possibilities of leveraging and even designing different
\emph{information structures}, for developing both sample- and
computation-efficient partially observable MARL.Comment: International Conference on Machine Learning (ICML) 202
Technical Report: Cooperative Multi-Target Localization With Noisy Sensors
This technical report is an extended version of the paper 'Cooperative
Multi-Target Localization With Noisy Sensors' accepted to the 2013 IEEE
International Conference on Robotics and Automation (ICRA).
This paper addresses the task of searching for an unknown number of static
targets within a known obstacle map using a team of mobile robots equipped with
noisy, limited field-of-view sensors. Such sensors may fail to detect a subset
of the visible targets or return false positive detections. These measurement
sets are used to localize the targets using the Probability Hypothesis Density,
or PHD, filter. Robots communicate with each other on a local peer-to-peer
basis and with a server or the cloud via access points, exchanging measurements
and poses to update their belief about the targets and plan future actions. The
server provides a mechanism to collect and synthesize information from all
robots and to share the global, albeit time-delayed, belief state to robots
near access points. We design a decentralized control scheme that exploits this
communication architecture and the PHD representation of the belief state.
Specifically, robots move to maximize mutual information between the target set
and measurements, both self-collected and those available by accessing the
server, balancing local exploration with sharing knowledge across the team.
Furthermore, robots coordinate their actions with other robots exploring the
same local region of the environment.Comment: Extended version of paper accepted to 2013 IEEE International
Conference on Robotics and Automation (ICRA
- …