9 research outputs found

    DualSMC: Tunneling Differentiable Filtering and Planning under Continuous POMDPs

    Full text link
    A major difficulty of solving continuous POMDPs is to infer the multi-modal distribution of the unobserved true states and to make the planning algorithm dependent on the perceived uncertainty. We cast POMDP filtering and planning problems as two closely related Sequential Monte Carlo (SMC) processes, one over the real states and the other over the future optimal trajectories, and combine the merits of these two parts in a new model named the DualSMC network. In particular, we first introduce an adversarial particle filter that leverages the adversarial relationship between its internal components. Based on the filtering results, we then propose a planning algorithm that extends the previous SMC planning approach [Piche et al., 2018] to continuous POMDPs with an uncertainty-dependent policy. Crucially, not only can DualSMC handle complex observations such as image input but also it remains highly interpretable. It is shown to be effective in three continuous POMDP domains: the floor positioning domain, the 3D light-dark navigation domain, and a modified Reacher domain.Comment: IJCAI 202

    Near-Optimal Adversarial Policy Switching for Decentralized Asynchronous Multi-Agent Systems

    Full text link
    A key challenge in multi-robot and multi-agent systems is generating solutions that are robust to other self-interested or even adversarial parties who actively try to prevent the agents from achieving their goals. The practicality of existing works addressing this challenge is limited to only small-scale synchronous decision-making scenarios or a single agent planning its best response against a single adversary with fixed, procedurally characterized strategies. In contrast this paper considers a more realistic class of problems where a team of asynchronous agents with limited observation and communication capabilities need to compete against multiple strategic adversaries with changing strategies. This problem necessitates agents that can coordinate to detect changes in adversary strategies and plan the best response accordingly. Our approach first optimizes a set of stratagems that represent these best responses. These optimized stratagems are then integrated into a unified policy that can detect and respond when the adversaries change their strategies. The near-optimality of the proposed framework is established theoretically as well as demonstrated empirically in simulation and hardware

    Learning for Multi-robot Cooperation in Partially Observable Stochastic Environments with Macro-actions

    Get PDF
    This paper presents a data-driven approach for multi-robot coordination in partially-observable domains based on Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) and macro-actions (MAs). Dec-POMDPs provide a general framework for cooperative sequential decision making under uncertainty and MAs allow temporally extended and asynchronous action execution. To date, most methods assume the underlying Dec-POMDP model is known a priori or a full simulator is available during planning time. Previous methods which aim to address these issues suffer from local optimality and sensitivity to initial conditions. Additionally, few hardware demonstrations involving a large team of heterogeneous robots and with long planning horizons exist. This work addresses these gaps by proposing an iterative sampling based Expectation-Maximization algorithm (iSEM) to learn polices using only trajectory data containing observations, MAs, and rewards. Our experiments show the algorithm is able to achieve better solution quality than the state-of-the-art learning-based methods. We implement two variants of multi-robot Search and Rescue (SAR) domains (with and without obstacles) on hardware to demonstrate the learned policies can effectively control a team of distributed robots to cooperate in a partially observable stochastic environment.Comment: Accepted to the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2017

    Semantic-level decentralized multi-robot decision-making using probabilistic macro-observations

    Get PDF
    Robust environment perception is essential for decision-making on robots operating in complex domains. Intelligent task execution requires principled treatment of uncertainty sources in a robot's observation model. This is important not only for low-level observations (e.g., accelerom-eter data), but also for high-level observations such as semantic object labels. This paper formalizes the concept of macro-observations in Decentralized Partially Observable Semi-Markov Decision Processes (Dec-POSMDPs), allowing scalable semantic-level multi-robot decision making. A hierarchical Bayesian approach is used to model noise statistics of low-level classifier outputs, while simultaneously allowing sharing of domain noise characteristics between classes. Classification accuracy of the proposed macro-observation scheme, called Hierarchical Bayesian Noise Inference (HBNI), is shown to exceed existing methods. The macro-observation scheme is then integrated into a Dec-POSMDP planner, with hardware experiments running onboard a team of dynamic quadrotors in a challenging domain where noise-agnostic filtering fails. To the best of our knowledge, this is the first demonstration of a real-time, convolutional neural net-based classification framework running fully onboard a team of quadrotors in a multi-robot decision-making domain.Boeing Compan

    Scalable accelerated decentralized multi-robot policy search in continuous observation spaces

    Get PDF
    This paper presents the first ever approach for solving continuous-observation Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) and their semi-Markovian counterparts, Dec-POSMDPs. This contribution is especially important in robotics, where a vast number of sensors provide continuous observation data. A continuous-observation policy representation is introduced using Stochastic Kernel-based Finite State Automata (SK-FSAs). An SK-FSA search algorithm titled Entropy-based Policy Search using Continuous Kernel Observations (EPSCKO) is introduced and applied to the first ever continuous-observation Dec-POMDP/Dec-POSMDP domain, where it significantly outperforms state-of-the-art discrete approaches. This methodology is equally applicable to Dec-POMDPs and Dec-POSMDPs, though the empirical analysis presented focuses on Dec-POSMDPs due to their higher scalability. To improve convergence, an entropy injection policy search acceleration approach for both continuous and discrete observation cases is also developed and shown to improve convergence rates without degrading policy quality.Boeing Compan

    POMDPs in Continuous Time and Discrete Spaces

    Full text link
    Many processes, such as discrete event systems in engineering or population dynamics in biology, evolve in discrete space and continuous time. We consider the problem of optimal decision making in such discrete state and action space systems under partial observability. This places our work at the intersection of optimal filtering and optimal control. At the current state of research, a mathematical description for simultaneous decision making and filtering in continuous time with finite countable state and action spaces is still missing. In this paper, we give a mathematical description of a continuous-time POMDP. By leveraging optimal filtering theory we derive a HJB type equation that characterizes the optimal solution. Using techniques from deep learning we approximately solve the resulting partial integro-differential equation. We present (i) an approach solving the decision problem offline by learning an approximation of the value function and (ii) an online algorithm which provides a solution in belief space using deep reinforcement learning. We show the applicability on a set of toy examples which pave the way for future methods providing solutions for high dimensional problems.Comment: published at Conference on Neural Information Processing Systems (NeurIPS) 202

    Graph-based Cross Entropy method for solving multi-robot decentralized POMDPs

    No full text
    This paper introduces a probabilistic algorithm for multi-robot decision-making under uncertainty, which can be posed as a Decentralized Partially Observable Markov Decision Process (Dec-POMDP). Dec-POMDPs are inherently synchronous decision-making frameworks which require significant computational resources to be solved, making them infeasible for many real-world robotics applications. The Decentralized Partially Observable Semi-Markov Decision Process (Dec-POSMDP) was recently introduced as an extension of the Dec-POMDP that uses high-level macro-actions to allow large-scale, asynchronous decision-making. However, existing Dec-POSMDP solution methods have limited scalability or perform poorly as the problem size grows. This paper proposes a cross-entropy based Dec-POSMDP algorithm motivated by the combinatorial optimization literature. The algorithm is applied to a constrained package delivery domain, where it significantly outperforms existing Dec-POSMDP solution methods.Boeing Compan

    Coordinating Team Tactics for Swarm-vs.-Swarm Adversarial Games

    Get PDF
    While swarms of UAVs have received much attention in the last few years, adversarial swarms (i.e., competitive, swarm-vs.-swarm games) have been less well studied. In this dissertation, I investigate the factors influential in team-vs.-team UAV aerial combat scenarios, elucidating the impacts of force concentration and opponent spread in the engagement space. Specifically, this dissertation makes the following contributions: (1) Tactical Analysis: Identifies conditions under which either explicitly-coordinating tactics or decentralized, greedy tactics are superior in engagements as small as 2-vs.-2 and as large as 10-vs.-10, and examines how these patterns change with the quality of the teams' weapons; (2) Coordinating Tactics: Introduces and demonstrates a deep-reinforcement-learning framework that equips agents to learn to use their own and their teammates' situational context to decide which pre-scripted tactics to employ in what situations, and which teammates, if any, to coordinate with throughout the engagement; the efficacy of agents using the neural network trained within this framework outperform baseline tactics in engagements against teams of agents employing baseline tactics in N-vs.-N engagements for N as small as two and as large as 64; and (3) Bio-Inspired Coordination: Discovers through Monte-Carlo agent-based simulations the importance of prioritizing the team's force concentration against the most threatening opponent agents, but also of preserving some resources by deploying a smaller defense force and defending against lower-penalty threats in addition to high-priority threats to maximize the remaining fuel within the defending team's fuel reservoir.Ph.D
    corecore