7 research outputs found
Policy gradient with value function approximation for collective multiagent planning
National Research Foundation (NRF) Singapore under Corp Lab @ University scheme; Fujitsu Lt
Apprendre à agir dans un Dec-POMDP
We address a long-standing open problem of reinforcement learning in decentralized partiallyobservable Markov decision processes. Previous attempts focussed on different forms of generalized policyiteration, which at best led to local optima. In this paper, we restrict attention to plans, which are simplerto store and update than policies. We derive, under certain conditions, the first near-optimal cooperativemulti-agent reinforcement learning algorithm. To achieve significant scalability gains, we replace the greedymaximization by mixed-integer linear programming. Experiments show our approach can learn to actnear-optimally in many finite domains from the literature