340 research outputs found

    Near-Optimal Adversarial Policy Switching for Decentralized Asynchronous Multi-Agent Systems

    Full text link
    A key challenge in multi-robot and multi-agent systems is generating solutions that are robust to other self-interested or even adversarial parties who actively try to prevent the agents from achieving their goals. The practicality of existing works addressing this challenge is limited to only small-scale synchronous decision-making scenarios or a single agent planning its best response against a single adversary with fixed, procedurally characterized strategies. In contrast this paper considers a more realistic class of problems where a team of asynchronous agents with limited observation and communication capabilities need to compete against multiple strategic adversaries with changing strategies. This problem necessitates agents that can coordinate to detect changes in adversary strategies and plan the best response accordingly. Our approach first optimizes a set of stratagems that represent these best responses. These optimized stratagems are then integrated into a unified policy that can detect and respond when the adversaries change their strategies. The near-optimality of the proposed framework is established theoretically as well as demonstrated empirically in simulation and hardware

    Stick-Breaking Policy Learning in Dec-POMDPs

    Get PDF
    Expectation maximization (EM) has recently been shown to be an efficient algorithm for learning finite-state controllers (FSCs) in large decentralized POMDPs (Dec-POMDPs). However, current methods use fixed-size FSCs and often converge to maxima that are far from optimal. This paper considers a variable-size FSC to represent the local policy of each agent. These variable-size FSCs are constructed using a stick-breaking prior, leading to a new framework called \emph{decentralized stick-breaking policy representation} (Dec-SBPR). This approach learns the controller parameters with a variational Bayesian algorithm without having to assume that the Dec-POMDP model is available. The performance of Dec-SBPR is demonstrated on several benchmark problems, showing that the algorithm scales to large problems while outperforming other state-of-the-art methods

    Decentralized Control of Partially Observable Markov Decision Processes using Belief Space Macro-actions

    Get PDF
    The focus of this paper is on solving multi-robot planning problems in continuous spaces with partial observability. Decentralized partially observable Markov decision processes (Dec-POMDPs) are general models for multi-robot coordination problems, but representing and solving Dec-POMDPs is often intractable for large problems. To allow for a high-level representation that is natural for multi-robot problems and scalable to large discrete and continuous problems, this paper extends the Dec-POMDP model to the decentralized partially observable semi-Markov decision process (Dec-POSMDP). The Dec-POSMDP formulation allows asynchronous decision-making by the robots, which is crucial in multi-robot domains. We also present an algorithm for solving this Dec-POSMDP which is much more scalable than previous methods since it can incorporate closed-loop belief space macro-actions in planning. These macro-actions are automatically constructed to produce robust solutions. The proposed method's performance is evaluated on a complex multi-robot package delivery problem under uncertainty, showing that our approach can naturally represent multi-robot problems and provide high-quality solutions for large-scale problems

    Decentralized Control of Cooperative Systems: Categorization and Complexity Analysis

    Full text link
    Decentralized control of cooperative systems captures the operation of a group of decision makers that share a single global objective. The difficulty in solving optimally such problems arises when the agents lack full observability of the global state of the system when they operate. The general problem has been shown to be NEXP-complete. In this paper, we identify classes of decentralized control problems whose complexity ranges between NEXP and P. In particular, we study problems characterized by independent transitions, independent observations, and goal-oriented objective functions. Two algorithms are shown to solve optimally useful classes of goal-oriented decentralized processes in polynomial time. This paper also studies information sharing among the decision-makers, which can improve their performance. We distinguish between three ways in which agents can exchange information: indirect communication, direct communication and sharing state features that are not controlled by the agents. Our analysis shows that for every class of problems we consider, introducing direct or indirect communication does not change the worst-case complexity. The results provide a better understanding of the complexity of decentralized control problems that arise in practice and facilitate the development of planning algorithms for these problems

    IST Austria Technical Report

    Get PDF
    DEC-POMDPs extend POMDPs to a multi-agent setting, where several agents operate in an uncertain environment independently to achieve a joint objective. DEC-POMDPs have been studied with finite-horizon and infinite-horizon discounted-sum objectives, and there exist solvers both for exact and approximate solutions. In this work we consider Goal-DEC-POMDPs, where given a set of target states, the objective is to ensure that the target set is reached with minimal cost. We consider the indefinite-horizon (infinite-horizon with either discounted-sum, or undiscounted-sum, where absorbing goal states have zero-cost) problem. We present a new method to solve the problem that extends methods for finite-horizon DEC- POMDPs and the RTDP-Bel approach for POMDPs. We present experimental results on several examples, and show our approach presents promising results

    Energy Efficient Execution of POMDP Policies

    Get PDF
    Recent advances in planning techniques for partially observable Markov decision processes have focused on online search techniques and offline point-based value iteration. While these techniques allow practitioners to obtain policies for fairly large problems, they assume that a non-negligible amount of computation can be done between each decision point. In contrast, the recent proliferation of mobile and embedded devices has lead to a surge of applications that could benefit from state of the art planning techniques if they can operate under severe constraints on computational resources. To that effect, we describe two techniques to compile policies into controllers that can be executed by a mere table lookup at each decision point. The first approach compiles policies induced by a set of alpha vectors (such as those obtained by point-based techniques) into approximately equivalent controllers, while the second approach performs a simulation to compile arbitrary policies into approximately equivalent controllers. We also describe an approach to compress controllers by removing redundant and dominated nodes, often yielding smaller and yet better controllers. Further compression and higher value can sometimes be obtained by considering stochastic controllers. The compilation and compression techniques are demonstrated on benchmark problems as well as a mobile application to help persons with Alzheimer's to way-find. The battery consumption of several POMDP policies is compared against finite-state controllers learned using methods introduced in this paper. Experiments performed on the Nexus 4 phone show that finite-state controllers are the least battery consuming POMDP policies

    Markov Decision Processes with Applications in Wireless Sensor Networks: A Survey

    Full text link
    Wireless sensor networks (WSNs) consist of autonomous and resource-limited devices. The devices cooperate to monitor one or more physical phenomena within an area of interest. WSNs operate as stochastic systems because of randomness in the monitored environments. For long service time and low maintenance cost, WSNs require adaptive and robust methods to address data exchange, topology formulation, resource and power optimization, sensing coverage and object detection, and security challenges. In these problems, sensor nodes are to make optimized decisions from a set of accessible strategies to achieve design goals. This survey reviews numerous applications of the Markov decision process (MDP) framework, a powerful decision-making tool to develop adaptive algorithms and protocols for WSNs. Furthermore, various solution methods are discussed and compared to serve as a guide for using MDPs in WSNs

    TaxiWorld: Developing and Evaluating Solution Methods for Multi-Agent Planning Domains

    Get PDF
    abstract: TaxiWorld is a Matlab simulation of a city with a fleet of taxis which operate within it, with the goal of transporting passengers to their destinations. The size of the city, as well as the number of available taxis and the frequency and general locations of fare appearances can all be set on a scenario-by-scenario basis. The taxis must attempt to service the fares as quickly as possible, by picking each one up and carrying it to its drop-off location. The TaxiWorld scenario is formally modeled using both Decentralized Partially-Observable Markov Decision Processes (Dec-POMDPs) and Multi-agent Markov Decision Processes (MMDPs). The purpose of developing formal models is to learn how to build and use formal Markov models, such as can be given to planners to solve for optimal policies in problem domains. However, finding optimal solutions for Dec-POMDPs is NEXP-Complete, so an empirical algorithm was also developed as an improvement to the method already in use on the simulator, and the methods were compared in identical scenarios to determine which is more effective. The empirical method is of course not optimal - rather, it attempts to simply account for some of the most important factors to achieve an acceptable level of effectiveness while still retaining a reasonable level of computational complexity for online solving.Dissertation/ThesisM.S. Computer Science 201
    • …
    corecore