7 research outputs found

    Policy gradient with value function approximation for collective multiagent planning

    Get PDF
    National Research Foundation (NRF) Singapore under Corp Lab @ University scheme; Fujitsu Lt

    Apprendre à agir dans un Dec-POMDP

    Get PDF
    We address a long-standing open problem of reinforcement learning in decentralized partiallyobservable Markov decision processes. Previous attempts focussed on different forms of generalized policyiteration, which at best led to local optima. In this paper, we restrict attention to plans, which are simplerto store and update than policies. We derive, under certain conditions, the first near-optimal cooperativemulti-agent reinforcement learning algorithm. To achieve significant scalability gains, we replace the greedymaximization by mixed-integer linear programming. Experiments show our approach can learn to actnear-optimally in many finite domains from the literature