Search CORE

11 research outputs found

Near-Optimal Adversarial Policy Switching for Decentralized Asynchronous Multi-Agent Systems

Author: Amato Christopher
Hoang Trong Nghia
How Jonathan
Sivakumar Kavinayan
Xiao Yuchen
Publication venue
Publication date: 17/10/2017
Field of study

A key challenge in multi-robot and multi-agent systems is generating solutions that are robust to other self-interested or even adversarial parties who actively try to prevent the agents from achieving their goals. The practicality of existing works addressing this challenge is limited to only small-scale synchronous decision-making scenarios or a single agent planning its best response against a single adversary with fixed, procedurally characterized strategies. In contrast this paper considers a more realistic class of problems where a team of asynchronous agents with limited observation and communication capabilities need to compete against multiple strategic adversaries with changing strategies. This problem necessitates agents that can coordinate to detect changes in adversary strategies and plan the best response accordingly. Our approach first optimizes a set of stratagems that represent these best responses. These optimized stratagems are then integrated into a unified policy that can detect and respond when the adversaries change their strategies. The near-optimality of the proposed framework is established theoretically as well as demonstrated empirically in simulation and hardware

arXiv.org e-Print Archive

Crossref

DSpace@MIT

Learning Augmented, Multi-Robot Long-Horizon Navigation in Partially Mapped Environments

Author: Khanal Abhish
Stein Gregory J.
Publication venue
Publication date: 29/03/2023
Field of study

We present a novel approach for efficient and reliable goal-directed long-horizon navigation for a multi-robot team in a structured, unknown environment by predicting statistics of unknown space. Building on recent work in learning-augmented model based planning under uncertainty, we introduce a high-level state and action abstraction that lets us approximate the challenging Dec-POMDP into a tractable stochastic MDP. Our Multi-Robot Learning over Subgoals Planner (MR-LSP) guides agents towards coordinated exploration of regions more likely to reach the unseen goal. We demonstrate improvement in cost against other multi-robot strategies; in simulated office-like environments, we show that our approach saves 13.29% (2 robot) and 4.6% (3 robot) average cost versus standard non-learned optimistic planning and a learning-informed baseline.Comment: 7 pages, 7 figures, ICRA202

arXiv.org e-Print Archive

Semantic-level decentralized multi-robot decision-making using probabilistic macro-observations

Author: Amato Christopher
Everett Michael F
How Jonathan P
Liu Miao
Liu Shih-Yuan
Lopez Brett Thomas
Omidshafiei Shayegan
Vian John
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/03/2017
Field of study

Robust environment perception is essential for decision-making on robots operating in complex domains. Intelligent task execution requires principled treatment of uncertainty sources in a robot's observation model. This is important not only for low-level observations (e.g., accelerom-eter data), but also for high-level observations such as semantic object labels. This paper formalizes the concept of macro-observations in Decentralized Partially Observable Semi-Markov Decision Processes (Dec-POSMDPs), allowing scalable semantic-level multi-robot decision making. A hierarchical Bayesian approach is used to model noise statistics of low-level classifier outputs, while simultaneously allowing sharing of domain noise characteristics between classes. Classification accuracy of the proposed macro-observation scheme, called Hierarchical Bayesian Noise Inference (HBNI), is shown to exceed existing methods. The macro-observation scheme is then integrated into a Dec-POSMDP planner, with hardware experiments running onboard a team of dynamic quadrotors in a challenging domain where noise-agnostic filtering fails. To the best of our knowledge, this is the first demonstration of a real-time, convolutional neural net-based classification framework running fully onboard a team of quadrotors in a multi-robot decision-making domain.Boeing Compan

arXiv.org e-Print Archive

DSpace@MIT

Crossref

Scalable accelerated decentralized multi-robot policy search in continuous observation spaces

Author: Amato Christopher
Everett Michael F
How Jonathan P
Liu Miao
Omidshafiei Shayegan
Vian John
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/03/2017
Field of study

This paper presents the first ever approach for solving continuous-observation Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) and their semi-Markovian counterparts, Dec-POSMDPs. This contribution is especially important in robotics, where a vast number of sensors provide continuous observation data. A continuous-observation policy representation is introduced using Stochastic Kernel-based Finite State Automata (SK-FSAs). An SK-FSA search algorithm titled Entropy-based Policy Search using Continuous Kernel Observations (EPSCKO) is introduced and applied to the first ever continuous-observation Dec-POMDP/Dec-POSMDP domain, where it significantly outperforms state-of-the-art discrete approaches. This methodology is equally applicable to Dec-POMDPs and Dec-POSMDPs, though the empirical analysis presented focuses on Dec-POSMDPs due to their higher scalability. To improve convergence, an entropy injection policy search acceleration approach for both continuous and discrete observation cases is also developed and shown to improve convergence rates without degrading policy quality.Boeing Compan

arXiv.org e-Print Archive

DSpace@MIT

Crossref

Hybrid control of a multi-agent UAV fleet for formation flight with Dec-POMDP

Author: Floriano Bruno Rodolfo de Oliveira
Publication venue
Publication date: 17/12/2019
Field of study

Voo em formação e controle cooperativo de múltplos VANTs têm sido áreas de estudo de grande interesse das pesquisas mais recentes. Enquanto diversos métodos estão sendo criados para rastreamento fino de referência e formação, muitos empecilhos ainda precisam ser superados tais como descentralização, comunicação confiável, divisão de tarefas, evitamento de colisões e autonomia. Neste cenário, este trabalho propõe um sistema de controle híbrido para ser usado no voo em formação de múltiplos VANTs de asa-fixa, aumentando a performance e eficiência do grupo por permitir que este planeje e controle a frota através de comandos discretos e contínuos. Para contornar o problema da centralização, o método de planejamento Dec-POMDP foi utilizado, de modo a evitar a confiabilidade em um nó central de tomada de decisão, como um líder ou uma estação em terra. Através do uso deste algoritmo, este método também considera transições e observações estocásticas para permitir uma tomada de decisão eficiente mesmo em ambientes ruidosos e incertos. Além disso, a implementação deste sistema em uma malha externa permite reduzir o tempo computacional. Através de simulações, o sistema proposto como uma topologia chaveada entre a política Dec-POMDP e controles PID foi comparada com outros métodos da literatura e apresentou uma performance satisfatória para o voo em formação.CAPESFormation flight and cooperative control of multiple UAVs has been areas of studies of great interest by the most recent researches. As many methods are being created to make fine reference and formation tracking, collision avoidance and disturbance rejection, many trammels are still necessary to be overcome such as decentralization, reliable communications, task division, obstacle avoidance and autonomy. In such scenario, this work proposes an hybrid control system to be used in formation flight of multiple fixed-wing UAVs, increasing the group performance and efficiency by allowing it to plan and control the fleet by using both discrete and continuous commands. To overcome the centralization problem, the Dec-POMDP planning method is used, in order to avoid the reliability on a central decision node, such as a leader or a ground station. By using such algorithm, this approach also considers stochastic transitions and observations to allow an effective decision making in noisy and uncertain environments. Also, the implementation of such system in an outer loop allows to reduce the computational time. Through simulations, the system proposed as a switching topology between the Dec-POMDP policy and PID controls was compared to other methods in the literature and has presented satisfactory performance for formation flight

Repositório Institucional da Universidade de Brasília