122,947 research outputs found
Utilising Assured Multi-Agent Reinforcement Learning within safety-critical scenarios
Multi-agent reinforcement learning allows a team of agents to learn how to work together to solve complex decision-making problems in a shared environment. However, this learning process utilises stochastic mechanisms, meaning that its use in safety-critical domains can be problematic. To overcome this issue, we propose an Assured Multi-Agent Reinforcement Learning (AMARL) approach that uses a model checking technique called quantitative verification to provide formal guarantees of agent compliance with safety, performance, and other non-functional requirements during and after the reinforcement learning process. We demonstrate the applicability of our AMARL approach in three different patrolling navigation domains in which multi-agent systems must learn to visit key areas by using different types of reinforcement learning algorithms (temporal difference learning, game theory, and direct policy search). Furthermore, we compare the effectiveness of these algorithms when used in combination with and without our approach. Our extensive experiments with both homogeneous and heterogeneous multi-agent systems of different sizes show that the use of AMARL leads to safety requirements being consistently satisfied and to better overall results than standard reinforcement learning
Multi-Agent Reinforcement Learning Guided by Signal Temporal Logic Specifications
Reward design is a key component of deep reinforcement learning, yet some
tasks and designer's objectives may be unnatural to define as a scalar cost
function. Among the various techniques, formal methods integrated with DRL have
garnered considerable attention due to their expressiveness and flexibility to
define the reward and requirements for different states and actions of the
agent. However, how to leverage Signal Temporal Logic (STL) to guide
multi-agent reinforcement learning reward design remains unexplored. Complex
interactions, heterogeneous goals and critical safety requirements in
multi-agent systems make this problem even more challenging. In this paper, we
propose a novel STL-guided multi-agent reinforcement learning framework. The
STL requirements are designed to include both task specifications according to
the objective of each agent and safety specifications, and the robustness
values of the STL specifications are leveraged to generate rewards. We validate
the advantages of our method through empirical studies. The experimental
results demonstrate significant reward performance improvements compared to
MARL without STL guidance, along with a remarkable increase in the overall
safety rate of the multi-agent systems
Safe Deep Reinforcement Learning: Enhancing the Reliability of Intelligent Systems
In the last few years, the impressive success of deep reinforcement learning (DRL) agents in a wide variety of applications has led to the adoption of these systems in safety-critical contexts (e.g., autonomous driving, robotics, and medical applications), where expensive hardware and human safety can be involved. In such contexts, an intelligent learning agent must adhere to certain requirements that go beyond the simple accomplishment of the task and typically include constraints on the agent's behavior. Against this background, this thesis proposes a set of training and validation methodologies that constitute a unified pipeline to generate safe and reliable DRL agents. In the first part of this dissertation, we focus on the problem of constrained DRL, leaving the challenging problem of the formal verification of deep neural networks for the second part of this work. As humans, in our growing process, the help of a mentor is crucial to learn effective strategies to solve a problem while a learning process driven only by a trial-and-error approach usually leads to unsafe and inefficient solutions. Similarly, a pure end-to-end deep reinforcement learning approach often results in suboptimal policies, which typically translates into unpredictable, and thus unreliable, behaviors. Following this intuition, we propose to impose a set of constraints into the DRL loop to guide the training process. These requirements, which typically encode domain expert knowledge, can be seen as suggestions that the agent should follow but is allowed to sometimes ignore if useful to maximize the reward signal. A foundational requirement for our work is finding a proper strategy to define and formally encode these constraints (which we refer to as \textit{rules}). In this thesis, we propose to exploit a formal language inherited from the software engineering community: scenario-based programming (SBP). For the actual training, we rely on the constrained reinforcement learning paradigm, proposing an extended version of the Lagrangian PPO algorithm. Recalling the parallelism with human beings, before being authorized to perform safety-critical operations, we must obtain a certification (e.g., a license to drive a car or a degree to perform medical operations). In the second part of this dissertation, we apply this concept in a deep reinforcement learning context, where the intelligent agents are controlled by artificial neural networks. In particular, we propose to perform a model selection phase after the training to find models that formally respect some given safety requirements before the deployment. However, DNNs have long been considered unpredictable black boxes and thus unsuitable for safety-critical contexts. Against this background, we build upon the emerging field of formal verification for neural networks to extend state-of-the-art approaches to robotic decision-making contexts. We propose ``ProVe", a verification tool for decision-making DNNs that quantifies the probability of violating the specified requirements. In the last chapter of this thesis, we provide a complete case study on a popular robotic problem: ``mapless navigation". Here, we show a concrete example of the application of our pipeline, starting from the definition of the requirements to the training and the final formal verification phase, to finally obtain a provably safe and effective agent
Formal Methods for Autonomous Systems
Formal methods refer to rigorous, mathematical approaches to system
development and have played a key role in establishing the correctness of
safety-critical systems. The main building blocks of formal methods are models
and specifications, which are analogous to behaviors and requirements in system
design and give us the means to verify and synthesize system behaviors with
formal guarantees.
This monograph provides a survey of the current state of the art on
applications of formal methods in the autonomous systems domain. We consider
correct-by-construction synthesis under various formulations, including closed
systems, reactive, and probabilistic settings. Beyond synthesizing systems in
known environments, we address the concept of uncertainty and bound the
behavior of systems that employ learning using formal methods. Further, we
examine the synthesis of systems with monitoring, a mitigation technique for
ensuring that once a system deviates from expected behavior, it knows a way of
returning to normalcy. We also show how to overcome some limitations of formal
methods themselves with learning. We conclude with future directions for formal
methods in reinforcement learning, uncertainty, privacy, explainability of
formal methods, and regulation and certification
Using machine learning to learn from demonstration: application to the AR.Drone quadrotor control
A dissertation submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in fulfilment of requirements for the degree of Master of Science. December 14, 2015Developing a robot that can operate autonomously is an active area in robotics research. An autonomously
operating robot can have a tremendous number of applications such as: surveillance and inspection;
search and rescue; and operating in hazardous environments. Reinforcement learning, a branch of machine
learning, provides an attractive framework for developing robust control algorithms since it is less
demanding in terms of both knowledge and programming effort. Given a reward function, reinforcement
learning employs a trial-and-error concept to make an agent learn. It is computationally intractable
in practice for an agent to learn “de novo”, thus it is important to provide the learning system with “a
priori” knowledge. Such prior knowledge would be in the form of demonstrations performed by the
teacher. However, prior knowledge does not necessarily guarantee that the agent will perform well. The
performance of the agent usually depends on the reward function, since the reward function describes
the formal specification of the control task. However, problems arise with complex reward function
that are difficult to specify manually. In order to address these problems, apprenticeship learning via
inverse reinforcement learning is used. Apprenticeship learning via inverse reinforcement learning can
be used to extract a reward function from the set of demonstrations so that the agent can optimise its
performance with respect to that reward function. In this research, a flight controller for the Ar.Drone
quadrotor was created using a reinforcement learning algorithm and function approximators with some
prior knowledge. The agent was able to perform a manoeuvre that is similar to the one demonstrated by
the teacher
Safe Multi-Agent Reinforcement Learning with Quantitatively Verified Constraints
Multi-agent reinforcement learning is a machine learning technique that involves
multiple agents attempting to solve sequential decision-making problems. This learn-
ing is driven by objectives and failures modelled as positive numerical rewards and
negative numerical punishments, respectively. These multi-agent systems explore
shared environments in order to find the highest cumulative reward for the sequential
decision-making problem. Multi-agent reinforcement learning within autonomous
systems has become a prominent research area with many examples of success and
potential applications. However, the safety-critical nature of many of these potential
applications is currently underexplored—and under-supported. Reinforcement learn-
ing, being a stochastic process, is unpredictable, meaning there are no assurances that
these systems will not harm themselves, other expensive equipment, or humans. This
thesis introduces Assured Multi-Agent Reinforcement Learning (AMARL) to mitigate
these issues. This approach constrains the actions of learning systems during and
after a learning process. Unlike previous multi-agent reinforcement learning methods,
AMARL synthesises constraints through the formal verification of abstracted multi-
agent Markov decision processes that model the environment’s functional and safety
aspects. Learned policies guided by these constraints are guaranteed to satisfy strict
functional and safety requirements and are Pareto-optimal with respect to a set of op-
timisation objectives. Two AMARL extensions are also introduced in the thesis. Firstly,
the thesis presents a Partial Policy Reuse method that allows the use of previously
learned knowledge to reduce AMARL learning time significantly when initial models
are inaccurate. Secondly, an Adaptive Constraints method is introduced to enable
agents to adapt to environmental changes by constraining their learning through a
procedure that follows the styling of monitoring, analysis, planning, and execution
during runtime. AMARL and its extensions are evaluated within three case studies
from different navigation-based domains and shown to produce policies that meet
strict safety and functional requirements
Reinforcement Learning With Temporal Logic Rewards
Reinforcement learning (RL) depends critically on the choice of reward
functions used to capture the de- sired behavior and constraints of a robot.
Usually, these are handcrafted by a expert designer and represent heuristics
for relatively simple tasks. Real world applications typically involve more
complex tasks with rich temporal and logical structure. In this paper we take
advantage of the expressive power of temporal logic (TL) to specify complex
rules the robot should follow, and incorporate domain knowledge into learning.
We propose Truncated Linear Temporal Logic (TLTL) as specifications language,
that is arguably well suited for the robotics applications, together with
quantitative semantics, i.e., robustness degree. We propose a RL approach to
learn tasks expressed as TLTL formulae that uses their associated robustness
degree as reward functions, instead of the manually crafted heuristics trying
to capture the same specifications. We show in simulated trials that learning
is faster and policies obtained using the proposed approach outperform the ones
learned using heuristic rewards in terms of the robustness degree, i.e., how
well the tasks are satisfied. Furthermore, we demonstrate the proposed RL
approach in a toast-placing task learned by a Baxter robot
- …