Search CORE

12 research outputs found

Cautious Reinforcement Learning with Logical Constraints

Author: Abate Alessandro
Hasanbeig Mohammadhosein
Kroening Daniel
Publication venue
Publication date: 01/01/2020
Field of study

This paper presents the concept of an adaptive safe padding that forces Reinforcement Learning (RL) to synthesise optimal control policies while ensuring safety during the learning process. Policies are synthesised to satisfy a goal, expressed as a temporal logic formula, with maximal probability. Enforcing the RL agent to stay safe during learning might limit the exploration, however we show that the proposed architecture is able to automatically handle the trade-off between efficient progress in exploration (towards goal satisfaction) and ensuring safety. Theoretical guarantees are available on the optimality of the synthesised policies and on the convergence of the learning algorithm. Experimental results are provided to showcase the performance of the proposed method.Comment: Accepted to AAMAS 2020. arXiv admin note: text overlap with arXiv:1902.0077

arXiv.org e-Print Archive

Oxford University Research Archive

Recommended from our members

Data-Driven Verification of Stochastic Linear Systems with Signal Temporal Logic Constraints

Author: Salamati Ali
Soudjani Sadegh
Zamani Majid
Publication venue
Publication date: 01/01/2021
Field of study

Cyber–physical systems usually have complex dynamics and are required to fulfill complex tasks. In recent years, formal methods from Computer Science have been used by control theorists for both describing the required tasks and ensuring that they are fulfilled by the systems. The crucial drawback of formal methods is that a complete model of the system often needs to be available. The main goal of this paper is to study formal verification of linear time-invariant systems with respect to a fragment of temporal logic specifications when only a partial knowledge of the model is available, i.e., a parameterized model of the system is known but the exact values of the parameters are unknown. We provide a probabilistic measure for the satisfaction of the specification by trajectories of the system under the influence of uncertainty. We assume these specifications are expressed as signal temporal logic formulae and provide an approach that relies on gathering input–output data from the system and employs Bayesian inference on the collected data to associate a notion of confidence to the satisfaction of the specification. The main novelty of our approach is to combine both data-driven and model-based techniques in order to have a two-layer probabilistic reasoning over the behavior of the system. The inner layer is with respect to the uncertainties in dynamics and observed data while the outer layer is with respect to the distribution over the parameter space. The latter is updated using Bayesian inference on the collected data. The proposed approach is demonstrated in two case studies.  </p

CU Scholar Institutional Repository

An Anytime Algorithm for Reachability on Uncountable MDP

Author: Grover Kush
Křetínský Jan
Meggendorfer Tobias
Weininger Maximilian
Publication venue
Publication date: 10/08/2020
Field of study

We provide an algorithm for reachability on Markov decision processes with uncountable state and action spaces, which, under mild assumptions, approximates the optimal value to any desired precision. It is the first such anytime algorithm, meaning that at any point in time it can return the current approximation with its precision. Moreover, it simultaneously is the first algorithm able to utilize \emph{learning} approaches without sacrificing guarantees and it further allows for combination with existing heuristics

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

IST Austria: PubRep (Institute of Science and Technology)

Lancaster E-Prints

Logically-constrained neural fitted Q-iteration

Author: Abate A
Hasanbeig M
Kroening D
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

Oxford University Research Archive

Logically-constrained neural fitted Q-iteration

Author: Abate A
Hasanbeig M
Kroening D
Publication venue
Publication date: 01/01/2019
Field of study

We propose a method for efficient training of Q-functions for continuous-state Markov Decision Processes (MDPs) such that the traces of the resulting policies satisfy a given Linear Temporal Logic (LTL) property. LTL, a modal logic, can express a wide range of time-dependent logical properties (including "safety") that are quite similar to patterns in natural language. We convert the LTL property into a limit deterministic Buchi automaton and construct an on-the-fly synchronised product MDP. The control policy is then synthesised by defining an adaptive reward function and by applying a modified neural fitted Q-iteration algorithm to the synchronised structure, assuming that no prior knowledge is available from the original MDP. The proposed method is evaluated in a numerical study to test the quality of the generated control policy and is compared with conventional methods for policy synthesis such as MDP abstraction (Voronoi quantizer) and approximate dynamic programming (fitted value iteration).Comment: AAMAS 201

arXiv.org e-Print Archive

Oxford University Research Archive