2,241 research outputs found
Utilising Assured Multi-Agent Reinforcement Learning within safety-critical scenarios
Multi-agent reinforcement learning allows a team of agents to learn how to work together to solve complex decision-making problems in a shared environment. However, this learning process utilises stochastic mechanisms, meaning that its use in safety-critical domains can be problematic. To overcome this issue, we propose an Assured Multi-Agent Reinforcement Learning (AMARL) approach that uses a model checking technique called quantitative verification to provide formal guarantees of agent compliance with safety, performance, and other non-functional requirements during and after the reinforcement learning process. We demonstrate the applicability of our AMARL approach in three different patrolling navigation domains in which multi-agent systems must learn to visit key areas by using different types of reinforcement learning algorithms (temporal difference learning, game theory, and direct policy search). Furthermore, we compare the effectiveness of these algorithms when used in combination with and without our approach. Our extensive experiments with both homogeneous and heterogeneous multi-agent systems of different sizes show that the use of AMARL leads to safety requirements being consistently satisfied and to better overall results than standard reinforcement learning
SA-Net: Deep Neural Network for Robot Trajectory Recognition from RGB-D Streams
Learning from demonstration (LfD) and imitation learning offer new paradigms
for transferring task behavior to robots. A class of methods that enable such
online learning require the robot to observe the task being performed and
decompose the sensed streaming data into sequences of state-action pairs, which
are then input to the methods. Thus, recognizing the state-action pairs
correctly and quickly in sensed data is a crucial prerequisite for these
methods. We present SA-Net a deep neural network architecture that recognizes
state-action pairs from RGB-D data streams. SA-Net performed well in two
diverse robotic applications of LfD -- one involving mobile ground robots and
another involving a robotic manipulator -- which demonstrates that the
architecture generalizes well to differing contexts. Comprehensive evaluations
including deployment on a physical robot show that \sanet{} significantly
improves on the accuracy of the previous method that utilizes traditional image
processing and segmentation.Comment: (in press
Game Theory Models for Multi-Robot Patrolling of Infraestructures
Abstract This work is focused on the problem of performing multi‐robot patrolling for infrastructure security applications in order to protect a known environment at critical facilities. Thus, given a set of robots and a set of points of interest, the patrolling task consists of constantly visiting these points at irregular time intervals for security purposes. Current existing solutions for these types of applications are predictable and inflexible. Moreover, most of the previous centralized and deterministic solutions and only few efforts have been made to integrate dynamic methods. Therefore, the development of new dynamic and decentralized collaborative approaches in order to solve the aforementioned problem by implementing learning models from Game Theory. The model selected in this work that includes belief‐based and reinforcement models as special cases is called Experience‐Weighted Attraction. The problem has been defined using concepts of Graph Theory to represent the environment in order to work with such Game Theory techniques. Finally, the proposed methods have been evaluated experimentally by using a patrolling simulator. The results obtained have been compared with previous availabl
An Energy-aware, Fault-tolerant, and Robust Deep Reinforcement Learning based approach for Multi-agent Patrolling Problems
Autonomous vehicles are suited for continuous area patrolling problems.
However, finding an optimal patrolling strategy can be challenging for many
reasons. Firstly, patrolling environments are often complex and can include
unknown environmental factors. Secondly, autonomous vehicles can have failures
or hardware constraints, such as limited battery life. Importantly, patrolling
large areas often requires multiple agents that need to collectively coordinate
their actions. In this work, we consider these limitations and propose an
approach based on model-free, deep multi-agent reinforcement learning. In this
approach, the agents are trained to automatically recharge themselves when
required, to support continuous collective patrolling. A distributed
homogeneous multi-agent architecture is proposed, where all patrolling agents
execute identical policies locally based on their local observations and shared
information. This architecture provides a fault-tolerant and robust patrolling
system that can tolerate agent failures and allow supplementary agents to be
added to replace failed agents or to increase the overall patrol performance.
The solution is validated through simulation experiments from multiple
perspectives, including the overall patrol performance, the efficiency of
battery recharging strategies, and the overall fault tolerance and robustness
A Multiagent Deep Reinforcement Learning Approach for Path Planning in Autonomous Surface Vehicles: The Ypacaraí Lake Patrolling Case
Article number 9330612Autonomous surfaces vehicles (ASVs) excel at monitoring and measuring aquatic nutrients
due to their autonomy, mobility, and relatively low cost. When planning paths for such vehicles, the task
of patrolling with multiple agents is usually addressed with heuristics approaches, such as Reinforcement
Learning (RL), because of the complexity and high dimensionality of the problem. Not only do efficient paths
have to be designed, but addressing disturbances in movement or the battery’s performance is mandatory.
For this multiagent patrolling task, the proposed approach is based on a centralized Convolutional Deep
Q-Network, designed with a final independent dense layer for every agent to deal with scalability, with the
hypothesis/assumption that every agent has the same properties and capabilities. For this purpose, a tailored
reward function is created which penalizes illegal actions (such as collisions) and rewards visiting idle
cells (cells that remains unvisited for a long time). A comparison with various multiagent Reinforcement
Learning (MARL) algorithms has been done (Independent Q-Learning, Dueling Q-Network and multiagent
Double Deep Q-Learning) in a case-study scenario like the Ypacaraí lake in Asunción (Paraguay). The
training results in multiagent policy leads to an average improvement of 15% compared to lawn mower
trajectories and a 6% improvement over the IDQL for the case-study considered. When evaluating the
training speed, the proposed approach runs three times faster than the independent algorithm.Ministerio de Ciencia, Innovación y Universidades (España) RTI2018-098964-B-I00Junta de Andalucía(España) PY18-RE000
Stochastic Reinforcement Learning
In reinforcement learning episodes, the rewards and punishments are often
non-deterministic, and there are invariably stochastic elements governing the
underlying situation. Such stochastic elements are often numerous and cannot be
known in advance, and they have a tendency to obscure the underlying rewards
and punishments patterns. Indeed, if stochastic elements were absent, the same
outcome would occur every time and the learning problems involved could be
greatly simplified. In addition, in most practical situations, the cost of an
observation to receive either a reward or punishment can be significant, and
one would wish to arrive at the correct learning conclusion by incurring
minimum cost. In this paper, we present a stochastic approach to reinforcement
learning which explicitly models the variability present in the learning
environment and the cost of observation. Criteria and rules for learning
success are quantitatively analyzed, and probabilities of exceeding the
observation cost bounds are also obtained.Comment: AIKE 201
Resilience of multi-robot systems to physical masquerade attacks
The advent of autonomous mobile multi-robot systems has driven innovation in both the industrial and defense sectors. The integration of such systems in safety-and security-critical applications has raised concern over their resilience to attack. In this work, we investigate the security problem of a stealthy adversary masquerading as a properly functioning agent. We show that conventional multi-agent pathfinding solutions are vulnerable to these physical masquerade attacks. Furthermore, we provide a constraint-based formulation of multi-agent pathfinding that yields multi-agent plans that are provably resilient to physical masquerade attacks. This formalization leverages inter-agent observations to facilitate introspective monitoring to guarantee resilience.Accepted manuscrip
- …