52 research outputs found
Influence-Optimistic Local Values for Multiagent Planning --- Extended Version
Recent years have seen the development of methods for multiagent planning
under uncertainty that scale to tens or even hundreds of agents. However, most
of these methods either make restrictive assumptions on the problem domain, or
provide approximate solutions without any guarantees on quality. Methods in the
former category typically build on heuristic search using upper bounds on the
value function. Unfortunately, no techniques exist to compute such upper bounds
for problems with non-factored value functions. To allow for meaningful
benchmarking through measurable quality guarantees on a very general class of
problems, this paper introduces a family of influence-optimistic upper bounds
for factored decentralized partially observable Markov decision processes
(Dec-POMDPs) that do not have factored value functions. Intuitively, we derive
bounds on very large multiagent planning problems by subdividing them in
sub-problems, and at each of these sub-problems making optimistic assumptions
with respect to the influence that will be exerted by the rest of the system.
We numerically compare the different upper bounds and demonstrate how we can
achieve a non-trivial guarantee that a heuristic solution for problems with
hundreds of agents is close to optimal. Furthermore, we provide evidence that
the upper bounds may improve the effectiveness of heuristic influence search,
and discuss further potential applications to multiagent planning.Comment: Long version of IJCAI 2015 paper (and extended abstract at AAMAS
2015
Improving Policies via Search in Cooperative Partially Observable Games
Recent superhuman results in games have largely been achieved in a variety of
zero-sum settings, such as Go and Poker, in which agents need to compete
against others. However, just like humans, real-world AI systems have to
coordinate and communicate with other agents in cooperative partially
observable environments as well. These settings commonly require participants
to both interpret the actions of others and to act in a way that is informative
when being interpreted. Those abilities are typically summarized as theory f
mind and are seen as crucial for social interactions. In this paper we propose
two different search techniques that can be applied to improve an arbitrary
agreed-upon policy in a cooperative partially observable game. The first one,
single-agent search, effectively converts the problem into a single agent
setting by making all but one of the agents play according to the agreed-upon
policy. In contrast, in multi-agent search all agents carry out the same
common-knowledge search procedure whenever doing so is computationally
feasible, and fall back to playing according to the agreed-upon policy
otherwise. We prove that these search procedures are theoretically guaranteed
to at least maintain the original performance of the agreed-upon policy (up to
a bounded approximation error). In the benchmark challenge problem of Hanabi,
our search technique greatly improves the performance of every agent we tested
and when applied to a policy trained using RL achieves a new state-of-the-art
score of 24.61 / 25 in the game, compared to a previous-best of 24.08 / 25
Apprendre Ă agir dans un Dec-POMDP
We address a long-standing open problem of reinforcement learning in decentralized partiallyobservable Markov decision processes. Previous attempts focussed on different forms of generalized policyiteration, which at best led to local optima. In this paper, we restrict attention to plans, which are simplerto store and update than policies. We derive, under certain conditions, the first near-optimal cooperativemulti-agent reinforcement learning algorithm. To achieve significant scalability gains, we replace the greedymaximization by mixed-integer linear programming. Experiments show our approach can learn to actnear-optimally in many finite domains from the literature
Optimally Solving Two-Agent Decentralized POMDPs Under One-Sided Information Sharing
International audienceOptimally solving decentralized partially observable Markov decision processes (Dec-POMDPs) under either full or no information sharing received significant attention in recent years. However, little is known about how partial information sharing affects existing theory and algorithms. This paper addresses this question for a team of two agents, with one-sided information sharing, i.e. both agents have imperfect information about the state of the world, but only one has access to what the other sees and does. From the perspective of a central planner, we show that the original problem can be reformulated into an equivalent information-state Markov decision process and solved as such. Besides, we prove that the optimal value function exhibits a specific form of uniform continuity. We also present heuristic search algorithms utilizing this property and providing the first results for this family of problems
- …