2,241 research outputs found

    Utilising Assured Multi-Agent Reinforcement Learning within safety-critical scenarios

    Get PDF
    Multi-agent reinforcement learning allows a team of agents to learn how to work together to solve complex decision-making problems in a shared environment. However, this learning process utilises stochastic mechanisms, meaning that its use in safety-critical domains can be problematic. To overcome this issue, we propose an Assured Multi-Agent Reinforcement Learning (AMARL) approach that uses a model checking technique called quantitative verification to provide formal guarantees of agent compliance with safety, performance, and other non-functional requirements during and after the reinforcement learning process. We demonstrate the applicability of our AMARL approach in three different patrolling navigation domains in which multi-agent systems must learn to visit key areas by using different types of reinforcement learning algorithms (temporal difference learning, game theory, and direct policy search). Furthermore, we compare the effectiveness of these algorithms when used in combination with and without our approach. Our extensive experiments with both homogeneous and heterogeneous multi-agent systems of different sizes show that the use of AMARL leads to safety requirements being consistently satisfied and to better overall results than standard reinforcement learning

    SA-Net: Deep Neural Network for Robot Trajectory Recognition from RGB-D Streams

    Full text link
    Learning from demonstration (LfD) and imitation learning offer new paradigms for transferring task behavior to robots. A class of methods that enable such online learning require the robot to observe the task being performed and decompose the sensed streaming data into sequences of state-action pairs, which are then input to the methods. Thus, recognizing the state-action pairs correctly and quickly in sensed data is a crucial prerequisite for these methods. We present SA-Net a deep neural network architecture that recognizes state-action pairs from RGB-D data streams. SA-Net performed well in two diverse robotic applications of LfD -- one involving mobile ground robots and another involving a robotic manipulator -- which demonstrates that the architecture generalizes well to differing contexts. Comprehensive evaluations including deployment on a physical robot show that \sanet{} significantly improves on the accuracy of the previous method that utilizes traditional image processing and segmentation.Comment: (in press

    Game Theory Models for Multi-Robot Patrolling of Infraestructures

    Get PDF
    Abstract This work is focused on the problem of performing multi‐robot patrolling for infrastructure security applications in order to protect a known environment at critical facilities. Thus, given a set of robots and a set of points of interest, the patrolling task consists of constantly visiting these points at irregular time intervals for security purposes. Current existing solutions for these types of applications are predictable and inflexible. Moreover, most of the previous centralized and deterministic solutions and only few efforts have been made to integrate dynamic methods. Therefore, the development of new dynamic and decentralized collaborative approaches in order to solve the aforementioned problem by implementing learning models from Game Theory. The model selected in this work that includes belief‐based and reinforcement models as special cases is called Experience‐Weighted Attraction. The problem has been defined using concepts of Graph Theory to represent the environment in order to work with such Game Theory techniques. Finally, the proposed methods have been evaluated experimentally by using a patrolling simulator. The results obtained have been compared with previous availabl

    An Energy-aware, Fault-tolerant, and Robust Deep Reinforcement Learning based approach for Multi-agent Patrolling Problems

    Full text link
    Autonomous vehicles are suited for continuous area patrolling problems. However, finding an optimal patrolling strategy can be challenging for many reasons. Firstly, patrolling environments are often complex and can include unknown environmental factors. Secondly, autonomous vehicles can have failures or hardware constraints, such as limited battery life. Importantly, patrolling large areas often requires multiple agents that need to collectively coordinate their actions. In this work, we consider these limitations and propose an approach based on model-free, deep multi-agent reinforcement learning. In this approach, the agents are trained to automatically recharge themselves when required, to support continuous collective patrolling. A distributed homogeneous multi-agent architecture is proposed, where all patrolling agents execute identical policies locally based on their local observations and shared information. This architecture provides a fault-tolerant and robust patrolling system that can tolerate agent failures and allow supplementary agents to be added to replace failed agents or to increase the overall patrol performance. The solution is validated through simulation experiments from multiple perspectives, including the overall patrol performance, the efficiency of battery recharging strategies, and the overall fault tolerance and robustness

    A Multiagent Deep Reinforcement Learning Approach for Path Planning in Autonomous Surface Vehicles: The Ypacaraí Lake Patrolling Case

    Get PDF
    Article number 9330612Autonomous surfaces vehicles (ASVs) excel at monitoring and measuring aquatic nutrients due to their autonomy, mobility, and relatively low cost. When planning paths for such vehicles, the task of patrolling with multiple agents is usually addressed with heuristics approaches, such as Reinforcement Learning (RL), because of the complexity and high dimensionality of the problem. Not only do efficient paths have to be designed, but addressing disturbances in movement or the battery’s performance is mandatory. For this multiagent patrolling task, the proposed approach is based on a centralized Convolutional Deep Q-Network, designed with a final independent dense layer for every agent to deal with scalability, with the hypothesis/assumption that every agent has the same properties and capabilities. For this purpose, a tailored reward function is created which penalizes illegal actions (such as collisions) and rewards visiting idle cells (cells that remains unvisited for a long time). A comparison with various multiagent Reinforcement Learning (MARL) algorithms has been done (Independent Q-Learning, Dueling Q-Network and multiagent Double Deep Q-Learning) in a case-study scenario like the Ypacaraí lake in Asunción (Paraguay). The training results in multiagent policy leads to an average improvement of 15% compared to lawn mower trajectories and a 6% improvement over the IDQL for the case-study considered. When evaluating the training speed, the proposed approach runs three times faster than the independent algorithm.Ministerio de Ciencia, Innovación y Universidades (España) RTI2018-098964-B-I00Junta de Andalucía(España) PY18-RE000

    Stochastic Reinforcement Learning

    Full text link
    In reinforcement learning episodes, the rewards and punishments are often non-deterministic, and there are invariably stochastic elements governing the underlying situation. Such stochastic elements are often numerous and cannot be known in advance, and they have a tendency to obscure the underlying rewards and punishments patterns. Indeed, if stochastic elements were absent, the same outcome would occur every time and the learning problems involved could be greatly simplified. In addition, in most practical situations, the cost of an observation to receive either a reward or punishment can be significant, and one would wish to arrive at the correct learning conclusion by incurring minimum cost. In this paper, we present a stochastic approach to reinforcement learning which explicitly models the variability present in the learning environment and the cost of observation. Criteria and rules for learning success are quantitatively analyzed, and probabilities of exceeding the observation cost bounds are also obtained.Comment: AIKE 201

    Resilience of multi-robot systems to physical masquerade attacks

    Full text link
    The advent of autonomous mobile multi-robot systems has driven innovation in both the industrial and defense sectors. The integration of such systems in safety-and security-critical applications has raised concern over their resilience to attack. In this work, we investigate the security problem of a stealthy adversary masquerading as a properly functioning agent. We show that conventional multi-agent pathfinding solutions are vulnerable to these physical masquerade attacks. Furthermore, we provide a constraint-based formulation of multi-agent pathfinding that yields multi-agent plans that are provably resilient to physical masquerade attacks. This formalization leverages inter-agent observations to facilitate introspective monitoring to guarantee resilience.Accepted manuscrip
    corecore