Search CORE

2 research outputs found

Probabilistic Guarantees for Safe Deep Reinforcement Learning

Author: E Ohn-Bar
EM Hahn
G Katz
J Garcia
J Kemeny
M Kattenbelt
M Kwiatkowska
M Lahijania
MC Machado
R Ehlers
S Junges
SEZ Soudjani
T Brázdil
V Mnih
X Huang
Publication venue
Publication date: 29/06/2020
Field of study

Deep reinforcement learning has been successfully applied to many control tasks, but the application of such agents in safety-critical scenarios has been limited due to safety concerns. Rigorous testing of these controllers is challenging, particularly when they operate in probabilistic environments due to, for example, hardware faults or noisy sensors. We propose MOSAIC, an algorithm for measuring the safety of deep reinforcement learning agents in stochastic settings. Our approach is based on the iterative construction of a formal abstraction of a controller's execution in an environment, and leverages probabilistic model checking of Markov decision processes to produce probabilistic guarantees on safe behaviour over a finite time horizon. It produces bounds on the probability of safe operation of the controller for different initial configurations and identifies regions where correct behaviour can be guaranteed. We implement and evaluate our approach on agents trained for several benchmark control problems

arXiv.org e-Print Archive

Crossref

University of Birmingham Research Portal

Adaptive Aggregation of Markov Chains: Quantitative Analysis of Chemical Reaction Networks

Author: A Abate
A Abate
A Angius
AM Kierzek
AP Moorsel van
B Munsky
BL Fox
C Baier
DT Gillespie
F Dannenberg
J Hasenauer
J Zhang
KG Larsen
L Bortolussi
L Ferm
M Hegland
M Kwiatkowska
M Mateescu
M Česka
P Buchholz
R Sidje
R Steuer
S Engblom
S Esmaeil Zadeh Soudjani
SEZ Soudjani
TA Henzinger
Publication venue
Publication date: 01/01/2015
Field of study

Quantitative analysis of Markov models typically proceeds through numerical methods or simulation-based evaluation. Since the state space of the models can often be large, exact or approximate state aggregation methods (such as lumping or bisimulation reduction) have been proposed to improve the scalability of the numerical schemes. However, none of the existing numerical techniques provides general, explicit bounds on the approximation error, a problem particularly relevant when the level of accuracy affects the soundness of verification results. We propose a novel numerical approach that combines the strengths of aggregation techniques (state-space reduction) with those of simulation-based approaches (automatic updates that adapt to the process dynamics). The key advantage of our scheme is that it provides rigorous precision guarantees under different measures. The new approach, which can be used in conjunction with time uniformisation techniques, is evaluated on two models of chemical reaction networks, a signalling pathway and a prokaryotic gene expression network: it demonstrates marked improvement in accuracy without performance degradation, particularly when compared to known state-space truncation techniques

Crossref

Oxford University Research Archive