29 research outputs found

    Scalable Planning and Learning for Multiagent POMDPs: Extended Version

    Get PDF
    Online, sample-based planning algorithms for POMDPs have shown great promise in scaling to problems with large state spaces, but they become intractable for large action and observation spaces. This is particularly problematic in multiagent POMDPs where the action and observation space grows exponentially with the number of agents. To combat this intractability, we propose a novel scalable approach based on sample-based planning and factored value functions that exploits structure present in many multiagent settings. This approach applies not only in the planning case, but also in the Bayesian reinforcement learning setting. Experimental results show that we are able to provide high quality solutions to large multiagent planning and learning problems

    Influence-Optimistic Local Values for Multiagent Planning --- Extended Version

    Get PDF
    Recent years have seen the development of methods for multiagent planning under uncertainty that scale to tens or even hundreds of agents. However, most of these methods either make restrictive assumptions on the problem domain, or provide approximate solutions without any guarantees on quality. Methods in the former category typically build on heuristic search using upper bounds on the value function. Unfortunately, no techniques exist to compute such upper bounds for problems with non-factored value functions. To allow for meaningful benchmarking through measurable quality guarantees on a very general class of problems, this paper introduces a family of influence-optimistic upper bounds for factored decentralized partially observable Markov decision processes (Dec-POMDPs) that do not have factored value functions. Intuitively, we derive bounds on very large multiagent planning problems by subdividing them in sub-problems, and at each of these sub-problems making optimistic assumptions with respect to the influence that will be exerted by the rest of the system. We numerically compare the different upper bounds and demonstrate how we can achieve a non-trivial guarantee that a heuristic solution for problems with hundreds of agents is close to optimal. Furthermore, we provide evidence that the upper bounds may improve the effectiveness of heuristic influence search, and discuss further potential applications to multiagent planning.Comment: Long version of IJCAI 2015 paper (and extended abstract at AAMAS 2015

    Apprendre Ă  agir dans un Dec-POMDP

    Get PDF
    We address a long-standing open problem of reinforcement learning in decentralized partiallyobservable Markov decision processes. Previous attempts focussed on different forms of generalized policyiteration, which at best led to local optima. In this paper, we restrict attention to plans, which are simplerto store and update than policies. We derive, under certain conditions, the first near-optimal cooperativemulti-agent reinforcement learning algorithm. To achieve significant scalability gains, we replace the greedymaximization by mixed-integer linear programming. Experiments show our approach can learn to actnear-optimally in many finite domains from the literature

    Optimally Solving Two-Agent Decentralized POMDPs Under One-Sided Information Sharing

    Get PDF
    International audienceOptimally solving decentralized partially observable Markov decision processes (Dec-POMDPs) under either full or no information sharing received significant attention in recent years. However, little is known about how partial information sharing affects existing theory and algorithms. This paper addresses this question for a team of two agents, with one-sided information sharing, i.e. both agents have imperfect information about the state of the world, but only one has access to what the other sees and does. From the perspective of a central planner, we show that the original problem can be reformulated into an equivalent information-state Markov decision process and solved as such. Besides, we prove that the optimal value function exhibits a specific form of uniform continuity. We also present heuristic search algorithms utilizing this property and providing the first results for this family of problems

    Agent Interactions In Decentralized Environments

    Get PDF
    The decentralized Markov decision process (Dec-POMDP) is a powerful formal model for studying multiagent problems where cooperative, coordinated action is optimal, but each agent acts based on local data alone. Unfortunately, it is known that Dec-POMDPs are fundamentally intractable: they are NEXP-complete in the worst case, and have been empirically observed to be beyond feasible optimal solution. To get around these obstacles, researchers have focused on special classes of the general Dec-POMDP problem, restricting the degree to which agent actions can interact with one another. In some cases, it has been proven that these sorts of structured forms of interaction can in fact reduce worst-case complexity. Where formal proofs have been lacking, empirical observations suggest that this may also be true for other cases, although less is known precisely. This thesis unifies a range of this existing work, extending analysis to establish novel complexity results for some popular restricted-interaction models. We also establish some new results concerning cases for which reduced complexity has been proven, showing correspondences between basic structural features and the potential for dimensionality reduction when employing mathematical programming techniques. As our new complexity results establish that worst-case intractability is more widespread than previously known, we look to new ways of analyzing the potential average-case difficulty of Dec-POMDP instances. As this would be extremely difficult using the tools of traditional complexity theory, we take a more empirical approach. In so doing, we identify new analytical measures that apply to all Dec-POMDPs, whatever their structure. These measures allow us to identify problems that are potentially easier to solve on average, and validate this claim empirically. As we show, the performance of well-known optimal dynamic programming methods correlates with our new measure of difficulty. Finally, we explore the approximate case, showing that our measure works well as a predictor of difficulty there, too, and provides a means of setting algorithm parameters to achieve far more efficient performance

    Embodied Evolution in Collective Robotics: A Review

    Full text link
    This paper provides an overview of evolutionary robotics techniques applied to on-line distributed evolution for robot collectives -- namely, embodied evolution. It provides a definition of embodied evolution as well as a thorough description of the underlying concepts and mechanisms. The paper also presents a comprehensive summary of research published in the field since its inception (1999-2017), providing various perspectives to identify the major trends. In particular, we identify a shift from considering embodied evolution as a parallel search method within small robot collectives (fewer than 10 robots) to embodied evolution as an on-line distributed learning method for designing collective behaviours in swarm-like collectives. The paper concludes with a discussion of applications and open questions, providing a milestone for past and an inspiration for future research.Comment: 23 pages, 1 figure, 1 tabl
    corecore