6 research outputs found

    Near-Optimal Adversarial Policy Switching for Decentralized Asynchronous Multi-Agent Systems

    Full text link
    A key challenge in multi-robot and multi-agent systems is generating solutions that are robust to other self-interested or even adversarial parties who actively try to prevent the agents from achieving their goals. The practicality of existing works addressing this challenge is limited to only small-scale synchronous decision-making scenarios or a single agent planning its best response against a single adversary with fixed, procedurally characterized strategies. In contrast this paper considers a more realistic class of problems where a team of asynchronous agents with limited observation and communication capabilities need to compete against multiple strategic adversaries with changing strategies. This problem necessitates agents that can coordinate to detect changes in adversary strategies and plan the best response accordingly. Our approach first optimizes a set of stratagems that represent these best responses. These optimized stratagems are then integrated into a unified policy that can detect and respond when the adversaries change their strategies. The near-optimality of the proposed framework is established theoretically as well as demonstrated empirically in simulation and hardware

    Decentralized multi-robot cooperation with auctioned pomdps

    Get PDF
    ABSTRACT Planning under uncertainty faces a scalability problem when considering multi-robot teams, as the information space scales exponentially with the number of robots. To address this issue, this paper proposes to decentralize multi-agent Partially Observable Markov Decision Process (POMDPs) while maintaining cooperation between robots by using POMDP policy auctions. Auctions provide a flexible way of coordinating individual policies modeled by POMDPs and have low communication requirements. Additionally, communication models in the multi-agent POMDP literature severely mismatch with real inter-robot communication. We address this issue by applying a decentralized data fusion method in order to efficiently maintain a joint belief state among the robots. The paper focuses on a cooperative tracking application, in which several robots have to jointly track a moving target of interest. The proposed ideas are illustrated in real multi-robot experiments, showcasing the flexible and robust coordination that our techniques can provide

    Planning delayed-response queries and transient policies under reward uncertainty

    Get PDF
    ABSTRACT We address situations in which an agent with uncertainty in rewards can selectively query another agent/human to improve its knowledge of rewards and thus its policy. When there is a time delay between posing the query and receiving the response, the agent must determine how to behave in the transient phase while waiting for the response. Thus, in order to act optimally the agent must jointly optimize its transient policy along with its query. In this paper, we formalize the aforementioned joint optimization problem and provide a new algorithm called JQTP for optimizing the Joint Query and Transient Policy. In addition, we provide a clustering technique that can be used in JQTP to flexibly trade performance for reduced computation. We illustrate our algorithms on a machine configuration task

    Multiagent Planning under Uncertainty with Stochastic Communication Delays

    Get PDF
    We consider the problem of cooperative multiagent planning under uncertainty, formalized as a decentralized partially observable Markov decision process (Dec-POMDP). Unfortunately, in these models optimal planning is provably intractable. By communicating their local observations before they take actions, agents synchronize their knowledge of the environment, and the planning problem reduces to a centralized POMDP. As such, relying on communication significantly reduces the complexity of planning. In the real world however, such communication might fail temporarily. We present a step towards more realistic communication models for Dec-POMDPs by proposing a model that: (1) allows that communication might be delayed by one or more time steps, and (2) explicitly considers future probabilities of successful communication. For our model, we discuss how to efficiently compute an (approximate) value function and corresponding policies, and we demonstrate our theoretical results with encouraging experiments
    corecore