142 research outputs found

    IST Austria Technical Report

    Get PDF
    DEC-POMDPs extend POMDPs to a multi-agent setting, where several agents operate in an uncertain environment independently to achieve a joint objective. DEC-POMDPs have been studied with finite-horizon and infinite-horizon discounted-sum objectives, and there exist solvers both for exact and approximate solutions. In this work we consider Goal-DEC-POMDPs, where given a set of target states, the objective is to ensure that the target set is reached with minimal cost. We consider the indefinite-horizon (infinite-horizon with either discounted-sum, or undiscounted-sum, where absorbing goal states have zero-cost) problem. We present a new method to solve the problem that extends methods for finite-horizon DEC- POMDPs and the RTDP-Bel approach for POMDPs. We present experimental results on several examples, and show our approach presents promising results

    Stick-Breaking Policy Learning in Dec-POMDPs

    Get PDF
    Expectation maximization (EM) has recently been shown to be an efficient algorithm for learning finite-state controllers (FSCs) in large decentralized POMDPs (Dec-POMDPs). However, current methods use fixed-size FSCs and often converge to maxima that are far from optimal. This paper considers a variable-size FSC to represent the local policy of each agent. These variable-size FSCs are constructed using a stick-breaking prior, leading to a new framework called \emph{decentralized stick-breaking policy representation} (Dec-SBPR). This approach learns the controller parameters with a variational Bayesian algorithm without having to assume that the Dec-POMDP model is available. The performance of Dec-SBPR is demonstrated on several benchmark problems, showing that the algorithm scales to large problems while outperforming other state-of-the-art methods

    Sur les politiques Markoviennes pour les Dec-POMDPs

    Get PDF
    This paper formulates the optimal decentralized control problem for a class of mathematicalmodels in which the system to be controlled is characterized by a finite-state discrete-time Markov process.The states of this internal process are not directly observable by the agents; rather, they have available a setof observable outputs that are only probabilistically related to the internal state of the system. The paperdemonstrates that, if there are only a finite number of control intervals remaining, then the optimal payofffunction of a Markov policy is a piecewise-linear, convex function of the current observation probabilitiesof the internal partially observable Markov process. In addition, algorithms for utilizing this property tocalculate either the optimal or an error-bounded Markov policy and payoff function for any finite horizon isoutlined.Cet article formule le problème du contrôle optimal décentralisé pour une classe de modèlesmathématiques dans laquelle le système à contrôler est caractérisé par un processus de Markov à tempsdiscret et à états finis. Les états de ce processus ne sont pas directement observables par les agents; cesderniers ont à leur disposition un ensemble d’observations lié de manière probabiliste à l’état du système.L’article démontre que, s’il ne reste qu’un nombre fini de pas de décision, la mesure de performanceoptimale d’une politique Markovienne est une fonction convexe, linéaire par morceaux, des probabilitésd’observation courantes. En outre, sont décrits les algorithmes approchés d’exploitation de cette propriétépour le calcul de politiques Markoviennes et la mesure de performance associée pour tout horizon fin

    Energy Efficient Execution of POMDP Policies

    Get PDF
    Recent advances in planning techniques for partially observable Markov decision processes have focused on online search techniques and offline point-based value iteration. While these techniques allow practitioners to obtain policies for fairly large problems, they assume that a non-negligible amount of computation can be done between each decision point. In contrast, the recent proliferation of mobile and embedded devices has lead to a surge of applications that could benefit from state of the art planning techniques if they can operate under severe constraints on computational resources. To that effect, we describe two techniques to compile policies into controllers that can be executed by a mere table lookup at each decision point. The first approach compiles policies induced by a set of alpha vectors (such as those obtained by point-based techniques) into approximately equivalent controllers, while the second approach performs a simulation to compile arbitrary policies into approximately equivalent controllers. We also describe an approach to compress controllers by removing redundant and dominated nodes, often yielding smaller and yet better controllers. Further compression and higher value can sometimes be obtained by considering stochastic controllers. The compilation and compression techniques are demonstrated on benchmark problems as well as a mobile application to help persons with Alzheimer's to way-find. The battery consumption of several POMDP policies is compared against finite-state controllers learned using methods introduced in this paper. Experiments performed on the Nexus 4 phone show that finite-state controllers are the least battery consuming POMDP policies

    Perseus: Randomized Point-based Value Iteration for POMDPs

    Full text link
    Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Point-based approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agents belief space. We present a randomized point-based value iteration algorithm called Perseus. The algorithm performs approximate value backup stages, ensuring that in each backup stage the value of each point in the belief set is improved; the key observation is that a single backup may improve the value of many belief points. Contrary to other point-based methods, Perseus backs up only a (randomly selected) subset of points in the belief set, sufficient for improving the value of each belief point in the set. We show how the same idea can be extended to dealing with continuous action spaces. Experimental results show the potential of Perseus in large scale POMDP problems

    A Novel Point-based Algorithm for Multi-agent Control Using the Common Information Approach

    Full text link
    The Common Information (CI) approach provides a systematic way to transform a multi-agent stochastic control problem to a single-agent partially observed Markov decision problem (POMDP) called the coordinator's POMDP. However, such a POMDP can be hard to solve due to its extraordinarily large action space. We propose a new algorithm for multi-agent stochastic control problems, called coordinator's heuristic search value iteration (CHSVI), that combines the CI approach and point-based POMDP algorithms for large action spaces. We demonstrate the algorithm through optimally solving several benchmark problems.Comment: 11 pages, 4 figure

    Producing efficient error-bounded solutions for transition independent decentralized MDPs

    Get PDF
    pages 539-546International audienceThere has been substantial progress on algorithms for single-agent sequential decision making problems represented as partially observable Markov decision processes (POMDPs). A number of efficient algorithms for solving POMDPs share two desirable properties: error-bounds and fast convergence rates. Despite significant efforts, no algorithms for solving decentralized POMDPs benefit from these properties, leading to either poor solution quality or limited scalability. This paper presents the first approach for solving transition independent decentralized Markov decision processes (MDPs), that inherits these properties. Two related algorithms illustrate this approach. The first recasts the original problem as a finite-horizon deterministic and completely observable Markov decision process. In this form, the original problem is solved by combining heuristic search with constraint optimization to quickly converge into a near-optimal policy. This algorithm also provides the foundation for the first algorithm for solving infinite-horizon transition independent decentralized MDPs. We demonstrate that both methods outperform state-of-the-art algorithms by multiple orders of magnitude, and for infinite-horizon decentralized MDPs, the algorithm is able to construct more concise policies by searching cyclic policy graphs
    • …
    corecore