406 research outputs found
Verifiably Safe Reinforcement Learning with Probabilistic Guarantees via Temporal Logic
Reinforcement Learning (RL) can solve complex tasks but does not
intrinsically provide any guarantees on system behavior. For real-world systems
that fulfill safety-critical tasks, such guarantees on safety specifications
are necessary. To bridge this gap, we propose a verifiably safe RL procedure
with probabilistic guarantees. First, our approach probabilistically verifies a
candidate controller with respect to a temporal logic specification, while
randomizing the controller's inputs within a bounded set. Then, we use RL to
improve the performance of this probabilistically verified, i.e. safe,
controller and explore in the same bounded set around the controller's input as
was randomized over in the verification step. Finally, we calculate
probabilistic safety guarantees with respect to temporal logic specifications
for the learned agent. Our approach is efficient for continuous action and
state spaces and separates safety verification and performance improvement into
two independent steps. We evaluate our approach on a safe evasion task where a
robot has to evade a dynamic obstacle in a specific manner while trying to
reach a goal. The results show that our verifiably safe RL approach leads to
efficient learning and performance improvements while maintaining safety
specifications
Cautious Reinforcement Learning with Logical Constraints
This paper presents the concept of an adaptive safe padding that forces
Reinforcement Learning (RL) to synthesise optimal control policies while
ensuring safety during the learning process. Policies are synthesised to
satisfy a goal, expressed as a temporal logic formula, with maximal
probability. Enforcing the RL agent to stay safe during learning might limit
the exploration, however we show that the proposed architecture is able to
automatically handle the trade-off between efficient progress in exploration
(towards goal satisfaction) and ensuring safety. Theoretical guarantees are
available on the optimality of the synthesised policies and on the convergence
of the learning algorithm. Experimental results are provided to showcase the
performance of the proposed method.Comment: Accepted to AAMAS 2020. arXiv admin note: text overlap with
arXiv:1902.0077
Certified Reinforcement Learning with Logic Guidance
This paper proposes the first model-free Reinforcement Learning (RL)
framework to synthesise policies for unknown, and continuous-state Markov
Decision Processes (MDPs), such that a given linear temporal property is
satisfied. We convert the given property into a Limit Deterministic Buchi
Automaton (LDBA), namely a finite-state machine expressing the property.
Exploiting the structure of the LDBA, we shape a synchronous reward function
on-the-fly, so that an RL algorithm can synthesise a policy resulting in traces
that probabilistically satisfy the linear temporal property. This probability
(certificate) is also calculated in parallel with policy learning when the
state space of the MDP is finite: as such, the RL algorithm produces a policy
that is certified with respect to the property. Under the assumption of finite
state space, theoretical guarantees are provided on the convergence of the RL
algorithm to an optimal policy, maximising the above probability. We also show
that our method produces ''best available'' control policies when the logical
property cannot be satisfied. In the general case of a continuous state space,
we propose a neural network architecture for RL and we empirically show that
the algorithm finds satisfying policies, if there exist such policies. The
performance of the proposed framework is evaluated via a set of numerical
examples and benchmarks, where we observe an improvement of one order of
magnitude in the number of iterations required for the policy synthesis,
compared to existing approaches whenever available.Comment: This article draws from arXiv:1801.08099, arXiv:1809.0782
How to Learn from Risk: Explicit Risk-Utility Reinforcement Learning for Efficient and Safe Driving Strategies
Autonomous driving has the potential to revolutionize mobility and is hence
an active area of research. In practice, the behavior of autonomous vehicles
must be acceptable, i.e., efficient, safe, and interpretable. While vanilla
reinforcement learning (RL) finds performant behavioral strategies, they are
often unsafe and uninterpretable. Safety is introduced through Safe RL
approaches, but they still mostly remain uninterpretable as the learned
behaviour is jointly optimized for safety and performance without modeling them
separately. Interpretable machine learning is rarely applied to RL. This paper
proposes SafeDQN, which allows to make the behavior of autonomous vehicles safe
and interpretable while still being efficient. SafeDQN offers an
understandable, semantic trade-off between the expected risk and the utility of
actions while being algorithmically transparent. We show that SafeDQN finds
interpretable and safe driving policies for a variety of scenarios and
demonstrate how state-of-the-art saliency techniques can help to assess both
risk and utility.Comment: 8 pages, 5 figure
- …