Search CORE

6,855 research outputs found

Certified Reinforcement Learning with Logic Guidance

Author: Abate Alessandro
Hasanbeig Mohammadhosein
Kroening Daniel
Publication venue
Publication date: 10/02/2020
Field of study

This paper proposes the first model-free Reinforcement Learning (RL) framework to synthesise policies for unknown, and continuous-state Markov Decision Processes (MDPs), such that a given linear temporal property is satisfied. We convert the given property into a Limit Deterministic Buchi Automaton (LDBA), namely a finite-state machine expressing the property. Exploiting the structure of the LDBA, we shape a synchronous reward function on-the-fly, so that an RL algorithm can synthesise a policy resulting in traces that probabilistically satisfy the linear temporal property. This probability (certificate) is also calculated in parallel with policy learning when the state space of the MDP is finite: as such, the RL algorithm produces a policy that is certified with respect to the property. Under the assumption of finite state space, theoretical guarantees are provided on the convergence of the RL algorithm to an optimal policy, maximising the above probability. We also show that our method produces ''best available'' control policies when the logical property cannot be satisfied. In the general case of a continuous state space, we propose a neural network architecture for RL and we empirically show that the algorithm finds satisfying policies, if there exist such policies. The performance of the proposed framework is evaluated via a set of numerical examples and benchmarks, where we observe an improvement of one order of magnitude in the number of iterations required for the policy synthesis, compared to existing approaches whenever available.Comment: This article draws from arXiv:1801.08099, arXiv:1809.0782

arXiv.org e-Print Archive

Learning Task Specifications from Demonstrations

Author: Ho Mark K.
Jha Susmit
Seshia Sanjit A.
Tiwari Ashish
Vazquez-Chanlatte Marcell
Publication venue
Publication date: 01/01/2018
Field of study

Real world applications often naturally decompose into several sub-tasks. In many settings (e.g., robotics) demonstrations provide a natural way to specify the sub-tasks. However, most methods for learning from demonstrations either do not provide guarantees that the artifacts learned for the sub-tasks can be safely recombined or limit the types of composition available. Motivated by this deficit, we consider the problem of inferring Boolean non-Markovian rewards (also known as logical trace properties or specifications) from demonstrations provided by an agent operating in an uncertain, stochastic environment. Crucially, specifications admit well-defined composition rules that are typically easy to interpret. In this paper, we formulate the specification inference task as a maximum a posteriori (MAP) probability inference problem, apply the principle of maximum entropy to derive an analytic demonstration likelihood model and give an efficient approach to search for the most likely specification in a large candidate pool of specifications. In our experiments, we demonstrate how learning specifications can help avoid common problems that often arise due to ad-hoc reward composition.Comment: NIPS 201

arXiv.org e-Print Archive

eScholarship - University of California

Recommended from our members

Oracle-Guided Design and Analysis of Learning-Based Cyber-Physical Systems

Author: Ghosh Shromona
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

We are in world where autonomous systems, such as self-driving cars, surgical robots, robotic manipulators are becoming a reality. Such systems are considered \textit{safety-critical} since they interact with humans on a regular basis. Hence, before such systems can be integrated into our day to day life, we need to guarantee their safety. Recent success in machine learning (ML) and artificial intelligence (AI) has led to an increase in their use in real world robotic systems. For example, complex perception modules in self-driving cars and deep reinforcement learning controllers in robotic manipulators. Although powerful, they introduce an additional level of complexity when it comes to the formal analysis of autonomous systems. In this thesis, such systems are designated as Learning-Based Cyber-Physical Systems~(LB-CPS). In this thesis, we take inspiration from the Oracle-Guided Inductive Synthesis~(OGIS) paradigm to develop frameworks which can aid in achieving formal guarantees in different stages of an autonomous system design and analysis pipeline. Furthermore, we show that to guarantee the safety of LB-CPS, the design (synthesis) and analysis (verification) must consider feedback from the other. We consider five important parts of the design and analysis process and show a strong coupling among them, namely (i) Robust Control Synthesis from High Level Safety Specifications; (ii) Diagnosis and Repair of Safety Requirements for Control Synthesis; (iii) Counter-example Guided Data Augmentation for training high-accuracy ML models; (iv) Simulation-Guided Falsification and Verification against Adversarial Environments; and (v) Bridging Model and Real-World Gap. Finally, we introduce a software toolkit \verifai{} for the design and analysis of AI based systems, which was developed to provide a common formal platform to implement design and analysis frameworks for LB-CPS

eScholarship - University of California