Search CORE

3,632 research outputs found

Certified Reinforcement Learning with Logic Guidance

Author: Abate Alessandro
Hasanbeig Mohammadhosein
Kroening Daniel
Publication venue
Publication date: 10/02/2020
Field of study

This paper proposes the first model-free Reinforcement Learning (RL) framework to synthesise policies for unknown, and continuous-state Markov Decision Processes (MDPs), such that a given linear temporal property is satisfied. We convert the given property into a Limit Deterministic Buchi Automaton (LDBA), namely a finite-state machine expressing the property. Exploiting the structure of the LDBA, we shape a synchronous reward function on-the-fly, so that an RL algorithm can synthesise a policy resulting in traces that probabilistically satisfy the linear temporal property. This probability (certificate) is also calculated in parallel with policy learning when the state space of the MDP is finite: as such, the RL algorithm produces a policy that is certified with respect to the property. Under the assumption of finite state space, theoretical guarantees are provided on the convergence of the RL algorithm to an optimal policy, maximising the above probability. We also show that our method produces ''best available'' control policies when the logical property cannot be satisfied. In the general case of a continuous state space, we propose a neural network architecture for RL and we empirically show that the algorithm finds satisfying policies, if there exist such policies. The performance of the proposed framework is evaluated via a set of numerical examples and benchmarks, where we observe an improvement of one order of magnitude in the number of iterations required for the policy synthesis, compared to existing approaches whenever available.Comment: This article draws from arXiv:1801.08099, arXiv:1809.0782

arXiv.org e-Print Archive

Conversion and verification procedure for goal-based control programs

Author: Braman J. M. B.
Murray R. M.
Publication venue: 'California Institute of Technology Library'
Publication date: 01/01/2007
Field of study

Fault tolerance and safety verification of control systems are essential for the success of autonomous robotic systems. A control architecture called Mission Data System, developed at the Jet Propulsion Laboratory, takes a goal-based control approach. In this paper, a method for converting goal network control programs into linear hybrid systems is developed. The linear hybrid system can then be verified for safety in the presence of failures using existing symbolic model checkers. An example task is developed and successfully verified using HyTech, a symbolic model checking software for linear hybrid systems

CiteSeerX

Caltech Authors

Computing Probabilistic Bisimilarity Distances for Probabilistic Automata

Author: Bacci Giorgio
Bacci Giovanni
Larsen Kim G.
Mardare Radu
Tang Qiyi
van Breugel Franck
Publication venue
Publication date: 01/01/2021
Field of study

The probabilistic bisimilarity distance of Deng et al. has been proposed as a robust quantitative generalization of Segala and Lynch's probabilistic bisimilarity for probabilistic automata. In this paper, we present a characterization of the bisimilarity distance as the solution of a simple stochastic game. The characterization gives us an algorithm to compute the distances by applying Condon's simple policy iteration on these games. The correctness of Condon's approach, however, relies on the assumption that the games are stopping. Our games may be non-stopping in general, yet we are able to prove termination for this extended class of games. Already other algorithms have been proposed in the literature to compute these distances, with complexity in

\textbf{UP} \cap \textbf{coUP}

and \textbf{PPAD}. Despite the theoretical relevance, these algorithms are inefficient in practice. To the best of our knowledge, our algorithm is the first practical solution. The characterization of the probabilistic bisimilarity distance mentioned above crucially uses a dual presentation of the Hausdorff distance due to M\'emoli. As an additional contribution, in this paper we show that M\'emoli's result can be used also to prove that the bisimilarity distance bounds the difference in the maximal (or minimal) probability of two states to satisfying arbitrary

\omega

-regular properties, expressed, eg., as LTL formulas

arXiv.org e-Print Archive

University of Strathclyde Institutional Repository

Episciences.org

Oxford University Research Archive

VBN