39 research outputs found

    Securing multi-robot systems with inter-robot observations and accusations

    Get PDF
    In various industries, such as manufacturing, logistics, agriculture, defense, search and rescue, and transportation, Multi-robot systems (MRSs) are increasingly gaining popularity. These systems involve multiple robots working together towards a shared objective, either autonomously or under human supervision. However, as MRSs operate in uncertain or even adversarial environments, and the sensors and actuators of each robot may be error-prone, they are susceptible to faults and security threats unique to MRSs. Classical techniques from distributed systems cannot detect or mitigate these threats. In this dissertation, novel techniques are proposed to enhance the security and fault-tolerance of MRSs through inter-robot observations and accusations. A fundamental security property is proposed for MRSs, which ensures that forbidden deviations from a desired multi-robot motion plan by the system supervisor are detected. Relying solely on self-reported motion information from the robots for monitoring deviations can leave the system vulnerable to attacks from a single compromised robot. The concept of co-observations is introduced, which are additional data reported to the supervisor to supplement the self-reported motion information. Co-observation-based detection is formalized as a method of identifying deviations from the expected motion plan based on discrepancies in the sequence of co-observations reported. An optimal deviation-detecting motion planning problem is formulated that achieves all the original application objectives while ensuring that all forbidden plan-deviation attacks trigger co-observation-based detection by the supervisor. A secure motion planner based on constraint solving is proposed as a proof-of-concept to implement the deviation-detecting security property. The security and resilience of MRSs against plan deviation attacks are further improved by limiting the information available to attackers. An efficient algorithm is proposed that verifies the inability of an attacker to stealthily perform forbidden plan deviation attacks with a given motion plan and announcement scheme. Such announcement schemes are referred to as horizon-limiting. An optimal horizon-limiting planning problem is formulated that maximizes planning lookahead while maintaining the announcement scheme as horizon-limiting. Co-observations and horizon-limiting announcements are shown to be efficient and scalable in protecting MRSs, including systems with hundreds of robots, as evidenced by a case study in a warehouse setting. Lastly, the Decentralized Blocklist Protocol (DBP), a method for designing Byzantine-resilient decentralized MRSs, is introduced. DBP is based on inter-robot accusations and allows cooperative robots to identify misbehavior through co-observations and share this information through the network. The method is adaptive to the number of faulty robots and is widely applicable to various decentralized MRS applications. It also permits fast information propagation, requires fewer cooperative observers of application-specific variables, and reduces the worst-case connectivity requirement, making it more scalable than existing methods. Empirical results demonstrate the scalability and effectiveness of DBP in cooperative target tracking, time synchronization, and localization case studies with hundreds of robots. The techniques proposed in this dissertation enhance the security and fault-tolerance of MRSs operating in uncertain and adversarial environments, aiding in the development of secure MRSs for emerging applications

    The Trust-Based Interactive Partially Observable Markov Decision Process

    Get PDF
    Cooperative agent and robot systems are designed so that each is working toward the same common good. The problem is that the software systems are extremely complex and can be subverted by an adversary to either break the system or potentially worse, create sneaky agents who are willing to cooperate when the stakes are low and take selfish, greedy actions when the rewards rise. This research focuses on the ability of a group of agents to reason about the trustworthiness of each other and make decisions about whether to cooperate. A trust-based interactive partially observable Markov decision process (TI-POMDP) is developed to model the trust interactions between agents, enabling the agents to select the best course of action from the current state. The TI-POMDP is a novel approach to multiagent cooperation based on an interactive partially observable Markov decision process (I-POMDP) augmented with trust relationships. Experiments using the Defender simulation demonstrate the TI-POMDP\u27s ability to accurately track the trust levels of agents with hidden agendas The TI-POMDP provides agents with the information needed to make decisions based on their level of trust and model of the environment. Testing demonstrates that agents quickly identify the hidden trust levels and mitigate the impact of a deceitful agent in comparison with a trust vector model. Agents using the TI-POMDP model achieved 3.8 times the average reward of agents using a trust vector model

    Aprendizagem de coordenação em sistemas multi-agente

    Get PDF
    The ability for an agent to coordinate with others within a system is a valuable property in multi-agent systems. Agents either cooperate as a team to accomplish a common goal, or adapt to opponents to complete different goals without being exploited. Research has shown that learning multi-agent coordination is significantly more complex than learning policies in singleagent environments, and requires a variety of techniques to deal with the properties of a system where agents learn concurrently. This thesis aims to determine how can machine learning be used to achieve coordination within a multi-agent system. It asks what techniques can be used to tackle the increased complexity of such systems and their credit assignment challenges, how to achieve coordination, and how to use communication to improve the behavior of a team. Many algorithms for competitive environments are tabular-based, preventing their use with high-dimension or continuous state-spaces, and may be biased against specific equilibrium strategies. This thesis proposes multiple deep learning extensions for competitive environments, allowing algorithms to reach equilibrium strategies in complex and partially-observable environments, relying only on local information. A tabular algorithm is also extended with a new update rule that eliminates its bias against deterministic strategies. Current state-of-the-art approaches for cooperative environments rely on deep learning to handle the environment’s complexity and benefit from a centralized learning phase. Solutions that incorporate communication between agents often prevent agents from being executed in a distributed manner. This thesis proposes a multi-agent algorithm where agents learn communication protocols to compensate for local partial-observability, and remain independently executed. A centralized learning phase can incorporate additional environment information to increase the robustness and speed with which a team converges to successful policies. The algorithm outperforms current state-of-the-art approaches in a wide variety of multi-agent environments. A permutation invariant network architecture is also proposed to increase the scalability of the algorithm to large team sizes. Further research is needed to identify how can the techniques proposed in this thesis, for cooperative and competitive environments, be used in unison for mixed environments, and whether they are adequate for general artificial intelligence.A capacidade de um agente se coordenar com outros num sistema é uma propriedade valiosa em sistemas multi-agente. Agentes cooperam como uma equipa para cumprir um objetivo comum, ou adaptam-se aos oponentes de forma a completar objetivos egoístas sem serem explorados. Investigação demonstra que aprender coordenação multi-agente é significativamente mais complexo que aprender estratégias em ambientes com um único agente, e requer uma variedade de técnicas para lidar com um ambiente onde agentes aprendem simultaneamente. Esta tese procura determinar como aprendizagem automática pode ser usada para encontrar coordenação em sistemas multi-agente. O documento questiona que técnicas podem ser usadas para enfrentar a superior complexidade destes sistemas e o seu desafio de atribuição de crédito, como aprender coordenação, e como usar comunicação para melhorar o comportamento duma equipa. Múltiplos algoritmos para ambientes competitivos são tabulares, o que impede o seu uso com espaços de estado de alta-dimensão ou contínuos, e podem ter tendências contra estratégias de equilíbrio específicas. Esta tese propõe múltiplas extensões de aprendizagem profunda para ambientes competitivos, permitindo a algoritmos atingir estratégias de equilíbrio em ambientes complexos e parcialmente-observáveis, com base em apenas informação local. Um algoritmo tabular é também extendido com um novo critério de atualização que elimina a sua tendência contra estratégias determinísticas. Atuais soluções de estado-da-arte para ambientes cooperativos têm base em aprendizagem profunda para lidar com a complexidade do ambiente, e beneficiam duma fase de aprendizagem centralizada. Soluções que incorporam comunicação entre agentes frequentemente impedem os próprios de ser executados de forma distribuída. Esta tese propõe um algoritmo multi-agente onde os agentes aprendem protocolos de comunicação para compensarem por observabilidade parcial local, e continuam a ser executados de forma distribuída. Uma fase de aprendizagem centralizada pode incorporar informação adicional sobre ambiente para aumentar a robustez e velocidade com que uma equipa converge para estratégias bem-sucedidas. O algoritmo ultrapassa abordagens estado-da-arte atuais numa grande variedade de ambientes multi-agente. Uma arquitetura de rede invariante a permutações é também proposta para aumentar a escalabilidade do algoritmo para grandes equipas. Mais pesquisa é necessária para identificar como as técnicas propostas nesta tese, para ambientes cooperativos e competitivos, podem ser usadas em conjunto para ambientes mistos, e averiguar se são adequadas a inteligência artificial geral.Apoio financeiro da FCT e do FSE no âmbito do III Quadro Comunitário de ApoioPrograma Doutoral em Informátic

    A Dynamical System Approach for Resource-Constrained Mobile Robotics

    Get PDF
    The revolution of autonomous vehicles has led to the development of robots with abundant sensors, actuators with many degrees of freedom, high-performance computing capabilities, and high-speed communication devices. These robots use a large volume of information from sensors to solve diverse problems. However, this usually leads to a significant modeling burden as well as excessive cost and computational requirements. Furthermore, in some scenarios, sophisticated sensors may not work precisely, the real-time processing power of a robot may be inadequate, the communication among robots may be impeded by natural or adversarial conditions, or the actuation control in a robot may be insubstantial. In these cases, we have to rely on simple robots with limited sensing and actuation, minimal onboard processing, moderate communication, and insufficient memory capacity. This reality motivates us to model simple robots such as bouncing and underactuated robots making use of the dynamical system techniques. In this dissertation, we propose a four-pronged approach for solving tasks in resource-constrained scenarios: 1) Combinatorial filters for bouncing robot localization; 2) Bouncing robot navigation and coverage; 3) Stochastic multi-robot patrolling; and 4) Deployment and planning of underactuated aquatic robots. First, we present a global localization method for a bouncing robot equipped with only a clock and contact sensors. Space-efficient and finite automata-based combinatorial filters are synthesized to solve the localization task by determining the robot’s pose (position and orientation) in its environment. Second, we propose a solution for navigation and coverage tasks using single or multiple bouncing robots. The proposed solution finds a navigation plan for a single bouncing robot from the robot’s initial pose to its goal pose with limited sensing. Probabilistic paths from several policies of the robot are combined artfully so that the actual coverage distribution can become as close as possible to a target coverage distribution. A joint trajectory for multiple bouncing robots to visit all the locations of an environment is incrementally generated. Third, a scalable method is proposed to find stochastic strategies for multi-robot patrolling under an adversarial and communication-constrained environment. Then, we evaluate the vulnerability of our patrolling policies by finding the probability of capturing an adversary for a location in our proposed patrolling scenarios. Finally, a data-driven deployment and planning approach is presented for the underactuated aquatic robots called drifters that creates the generalized flow pattern of the water, develops a Markov-chain based motion model, and studies the long- term behavior of a marine environment from a flow point-of-view. In a broad summary, our dynamical system approach is a unique solution to typical robotic tasks and opens a new paradigm for the modeling of simple robotics system

    Multi-Robot Path Planning for Persistent Monitoring in Stochastic and Adversarial Environments

    Get PDF
    In this thesis, we study multi-robot path planning problems for persistent monitoring tasks. The goal of such persistent monitoring tasks is to deploy a team of cooperating mobile robots in an environment to continually observe locations of interest in the environment. Robots patrol the environment in order to detect events arriving at the locations of the environment. The events stay at those locations for a certain amount of time before leaving and can only be detected if one of the robots visits the location of an event while the event is there. In order to detect all possible events arriving at a vertex, the maximum time spent by the robots between visits to that vertex should be less than the duration of the events arriving at that vertex. We consider the problem of finding the minimum number of robots to satisfy these revisit time constraints, also called latency constraints. The decision version of this problem is PSPACE-complete. We provide an O(log p) approximation algorithm for this problem where p is the ratio of the maximum and minimum latency constraints. We also present heuristic algorithms to solve the problem and show through simulations that a proposed orienteering-based heuristic algorithm gives better solutions than the approximation algorithm. We additionally provide an algorithm for the problem of minimizing the maximum weighted latency given a fixed number of robots. In case the event stay durations are not fixed but are drawn from a known distribution, we consider the problem of maximizing the expected number of detected events. We motivate randomized patrolling paths for such scenarios and use Markov chains to represent those random patrolling paths. We characterize the expected number of detected events as a function of the Markov chains used for patrolling and show that the objective function is submodular for randomly arriving events. We propose an approximation algorithm for the case where the event durations for all the vertices is a constant. We also propose a centralized and an online distributed algorithm to find the random patrolling policies for the robots. We also consider the case where the events are adversarial and can choose where and when to appear in order to maximize their chances of remaining undetected. The last problem we study in this thesis considers events triggered by a learning adversary. The adversary has a limited time to observe the patrolling policy before it decides when and where events should appear. We study the single robot version of this problem and model this problem as a multi-stage two player game. The adversary observes the patroller’s actions for a finite amount of time to learn the patroller’s strategy and then either chooses a location for the event to appear or reneges based on its confidence in the learned strategy. We characterize the expected payoffs for the players and propose a search algorithm to find a patrolling policy in such scenarios. We illustrate the trade off between hard to learn and hard to attack strategies through simulations
    corecore