2 research outputs found
Probabilistic Guarantees for Safe Deep Reinforcement Learning
Deep reinforcement learning has been successfully applied to many control
tasks, but the application of such agents in safety-critical scenarios has been
limited due to safety concerns. Rigorous testing of these controllers is
challenging, particularly when they operate in probabilistic environments due
to, for example, hardware faults or noisy sensors. We propose MOSAIC, an
algorithm for measuring the safety of deep reinforcement learning agents in
stochastic settings. Our approach is based on the iterative construction of a
formal abstraction of a controller's execution in an environment, and leverages
probabilistic model checking of Markov decision processes to produce
probabilistic guarantees on safe behaviour over a finite time horizon. It
produces bounds on the probability of safe operation of the controller for
different initial configurations and identifies regions where correct behaviour
can be guaranteed. We implement and evaluate our approach on agents trained for
several benchmark control problems
Adaptive Aggregation of Markov Chains: Quantitative Analysis of Chemical Reaction Networks
Quantitative analysis of Markov models typically proceeds through numerical methods or simulation-based evaluation. Since the state space of the models can often be large, exact or approximate state aggregation methods (such as lumping or bisimulation reduction) have been proposed to improve the scalability of the numerical schemes. However, none of the existing numerical techniques provides general, explicit bounds on the approximation error, a problem particularly relevant when the level of accuracy affects the soundness of verification results. We propose a novel numerical approach that combines the strengths of aggregation techniques (state-space reduction) with those of simulation-based approaches (automatic updates that adapt to the process dynamics). The key advantage of our scheme is that it provides rigorous precision guarantees under different measures. The new approach, which can be used in conjunction with time uniformisation techniques, is evaluated on two models of chemical reaction networks, a signalling pathway and a prokaryotic gene expression network: it demonstrates marked improvement in accuracy without performance degradation, particularly when compared to known state-space truncation techniques