11 research outputs found
Near-Optimal Adversarial Policy Switching for Decentralized Asynchronous Multi-Agent Systems
A key challenge in multi-robot and multi-agent systems is generating
solutions that are robust to other self-interested or even adversarial parties
who actively try to prevent the agents from achieving their goals. The
practicality of existing works addressing this challenge is limited to only
small-scale synchronous decision-making scenarios or a single agent planning
its best response against a single adversary with fixed, procedurally
characterized strategies. In contrast this paper considers a more realistic
class of problems where a team of asynchronous agents with limited observation
and communication capabilities need to compete against multiple strategic
adversaries with changing strategies. This problem necessitates agents that can
coordinate to detect changes in adversary strategies and plan the best response
accordingly. Our approach first optimizes a set of stratagems that represent
these best responses. These optimized stratagems are then integrated into a
unified policy that can detect and respond when the adversaries change their
strategies. The near-optimality of the proposed framework is established
theoretically as well as demonstrated empirically in simulation and hardware
Learning Augmented, Multi-Robot Long-Horizon Navigation in Partially Mapped Environments
We present a novel approach for efficient and reliable goal-directed
long-horizon navigation for a multi-robot team in a structured, unknown
environment by predicting statistics of unknown space. Building on recent work
in learning-augmented model based planning under uncertainty, we introduce a
high-level state and action abstraction that lets us approximate the
challenging Dec-POMDP into a tractable stochastic MDP. Our Multi-Robot Learning
over Subgoals Planner (MR-LSP) guides agents towards coordinated exploration of
regions more likely to reach the unseen goal. We demonstrate improvement in
cost against other multi-robot strategies; in simulated office-like
environments, we show that our approach saves 13.29% (2 robot) and 4.6% (3
robot) average cost versus standard non-learned optimistic planning and a
learning-informed baseline.Comment: 7 pages, 7 figures, ICRA202
Semantic-level decentralized multi-robot decision-making using probabilistic macro-observations
Robust environment perception is essential for decision-making on robots operating in complex domains. Intelligent task execution requires principled treatment of uncertainty sources in a robot's observation model. This is important not only for low-level observations (e.g., accelerom-eter data), but also for high-level observations such as semantic object labels. This paper formalizes the concept of macro-observations in Decentralized Partially Observable Semi-Markov Decision Processes (Dec-POSMDPs), allowing scalable semantic-level multi-robot decision making. A hierarchical Bayesian approach is used to model noise statistics of low-level classifier outputs, while simultaneously allowing sharing of domain noise characteristics between classes. Classification accuracy of the proposed macro-observation scheme, called Hierarchical Bayesian Noise Inference (HBNI), is shown to exceed existing methods. The macro-observation scheme is then integrated into a Dec-POSMDP planner, with hardware experiments running onboard a team of dynamic quadrotors in a challenging domain where noise-agnostic filtering fails. To the best of our knowledge, this is the first demonstration of a real-time, convolutional neural net-based classification framework running fully onboard a team of quadrotors in a multi-robot decision-making domain.Boeing Compan
Scalable accelerated decentralized multi-robot policy search in continuous observation spaces
This paper presents the first ever approach for solving continuous-observation Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) and their semi-Markovian counterparts, Dec-POSMDPs. This contribution is especially important in robotics, where a vast number of sensors provide continuous observation data. A continuous-observation policy representation is introduced using Stochastic Kernel-based Finite State Automata (SK-FSAs). An SK-FSA search algorithm titled Entropy-based Policy Search using Continuous Kernel Observations (EPSCKO) is introduced and applied to the first ever continuous-observation Dec-POMDP/Dec-POSMDP domain, where it significantly outperforms state-of-the-art discrete approaches. This methodology is equally applicable to Dec-POMDPs and Dec-POSMDPs, though the empirical analysis presented focuses on Dec-POSMDPs due to their higher scalability. To improve convergence, an entropy injection policy search acceleration approach for both continuous and discrete observation cases is also developed and shown to improve convergence rates without degrading policy quality.Boeing Compan
Hybrid control of a multi-agent UAV fleet for formation flight with Dec-POMDP
Voo em formação e controle cooperativo de múltplos VANTs têm sido áreas de estudo de grande interesse das pesquisas mais recentes. Enquanto diversos métodos estão sendo criados para rastreamento fino de referência e formação, muitos empecilhos ainda precisam ser superados tais como descentralização, comunicação confiável, divisão de tarefas, evitamento de colisões e autonomia. Neste cenário, este trabalho propõe um sistema de controle híbrido para ser usado no voo em formação de múltiplos VANTs de asa-fixa, aumentando a performance e eficiência do grupo por permitir que este planeje e controle a frota através de comandos discretos e contínuos. Para contornar o problema da centralização, o método de planejamento Dec-POMDP foi utilizado, de modo a evitar a confiabilidade em um nó central de tomada de decisão, como um líder ou uma estação em terra. Através do uso deste algoritmo, este método também considera transições e observações estocásticas para permitir uma tomada de decisão eficiente mesmo em ambientes ruidosos e incertos. Além disso, a implementação deste sistema em uma malha externa permite reduzir o tempo computacional. Através de simulações, o sistema proposto como uma topologia chaveada entre a política Dec-POMDP e controles PID foi comparada com outros métodos da literatura e apresentou uma performance satisfatória para o voo em formação.CAPESFormation flight and cooperative control of multiple UAVs has been areas of studies of great interest by the most recent researches. As many methods are being created to make fine reference and formation tracking, collision avoidance and disturbance rejection, many trammels are still necessary to be overcome such as decentralization, reliable communications, task division, obstacle avoidance and autonomy. In such scenario, this work proposes an hybrid control system to be used in formation flight of multiple fixed-wing UAVs, increasing the group performance and efficiency by allowing it to plan and control the fleet by using both discrete and continuous commands. To overcome the centralization problem, the Dec-POMDP planning method is used, in order to avoid the reliability on a central decision node, such as a leader or a ground station. By using such algorithm, this approach also considers stochastic transitions and observations to allow an effective decision making in noisy and uncertain environments. Also, the implementation of such system in an outer loop allows to reduce the computational time. Through simulations, the system proposed as a switching topology between the Dec-POMDP policy and PID controls was compared to other methods in the literature and has presented satisfactory performance for formation flight