2,828 research outputs found

    Verifiable Reinforcement Learning via Policy Extraction

    Full text link
    While deep reinforcement learning has successfully solved many challenging control tasks, its real-world applicability has been limited by the inability to ensure the safety of learned policies. We propose an approach to verifiable reinforcement learning by training decision tree policies, which can represent complex policies (since they are nonparametric), yet can be efficiently verified using existing techniques (since they are highly structured). The challenge is that decision tree policies are difficult to train. We propose VIPER, an algorithm that combines ideas from model compression and imitation learning to learn decision tree policies guided by a DNN policy (called the oracle) and its Q-function, and show that it substantially outperforms two baselines. We use VIPER to (i) learn a provably robust decision tree policy for a variant of Atari Pong with a symbolic state space, (ii) learn a decision tree policy for a toy game based on Pong that provably never loses, and (iii) learn a provably stable decision tree policy for cart-pole. In each case, the decision tree policy achieves performance equal to that of the original DNN policy

    MSVIPER: Improved Policy Distillation for Reinforcement-Learning-Based Robot Navigation

    Full text link
    We present Multiple Scenario Verifiable Reinforcement Learning via Policy Extraction (MSVIPER), a new method for policy distillation to decision trees for improved robot navigation. MSVIPER learns an "expert" policy using any Reinforcement Learning (RL) technique involving learning a state-action mapping and then uses imitation learning to learn a decision-tree policy from it. We demonstrate that MSVIPER results in efficient decision trees and can accurately mimic the behavior of the expert policy. Moreover, we present efficient policy distillation and tree-modification techniques that take advantage of the decision tree structure to allow improvements to a policy without retraining. We use our approach to improve the performance of RL-based robot navigation algorithms for indoor and outdoor scenes. We demonstrate the benefits in terms of reduced freezing and oscillation behaviors (by up to 95\% reduction) for mobile robots navigating among dynamic obstacles and reduced vibrations and oscillation (by up to 17\%) for outdoor robot navigation on complex, uneven terrains.Comment: 6 pages main paper, 2 pages of references, 5 page appendix (13 pages total) 5 tables, 9 algorithms, 4 figure

    Probabilistic Guarantees for Safe Deep Reinforcement Learning

    Full text link
    Deep reinforcement learning has been successfully applied to many control tasks, but the application of such agents in safety-critical scenarios has been limited due to safety concerns. Rigorous testing of these controllers is challenging, particularly when they operate in probabilistic environments due to, for example, hardware faults or noisy sensors. We propose MOSAIC, an algorithm for measuring the safety of deep reinforcement learning agents in stochastic settings. Our approach is based on the iterative construction of a formal abstraction of a controller's execution in an environment, and leverages probabilistic model checking of Markov decision processes to produce probabilistic guarantees on safe behaviour over a finite time horizon. It produces bounds on the probability of safe operation of the controller for different initial configurations and identifies regions where correct behaviour can be guaranteed. We implement and evaluate our approach on agents trained for several benchmark control problems

    RAVEN: Reinforcement Learning for Generating Verifiable Run-Time Requirement Enforcers for MPSoCs

    Get PDF
    In embedded systems, applications frequently have to meet non-functional requirements regarding, e.g., real-time or energy consumption constraints, when executing on a given MPSoC target platform. Feedback-based controllers have been proposed that react to transient environmental factors by adapting the DVFS settings or degree of parallelism following some predefined control strategy. However, it is, in general, not possible to give formal guarantees for the obtained controllers to satisfy a given set of non-functional requirements. Run-time requirement enforcement has emerged as a field of research for the enforcement of non-functional requirements at run-time, allowing to define and formally verify properties on respective control strategies specified by automata. However, techniques for the automatic generation of such controllers have not yet been established. In this paper, we propose a technique using reinforcement learning to automatically generate verifiable feedback-based enforcers. For that, we train a control policy based on a representative input sequence at design time. The learned control strategy is then transformed into a verifiable enforcement automaton which constitutes our run-time control model that can handle unseen input data. As a case study, we apply the approach to generate controllers that are able to increase the probability of satisfying a given set of requirement verification goals compared to multiple state-of-the-art approaches, as can be verified by model checkers

    MARLeME: A Multi-Agent Reinforcement Learning Model Extraction Library.

    Get PDF
    Multi-Agent Reinforcement Learning (MARL) en-compasses a powerful class of methodologies that have beenapplied in a wide range of fields. An effective way to furtherempower these methodologies is to develop approaches and toolsthat could expand their interpretability and explainability. Inthis work, we introduce MARLeME: a MARL model extractionlibrary, designed to improve explainability of MARL systemsby approximating them with symbolic models. Symbolic modelsoffer a high degree of interpretability, well-defined properties,and verifiable behaviour. Consequently, they can be used toinspect and better understand the underlying MARL systemsand corresponding MARL agents, as well as to replace all/someof the agents that are particularly safety and security critical.In this work, we demonstrate how MARLeME can be appliedto two well-known case studies (Cooperative Navigation andRoboCup Takeaway), using extracted models based on AbstractArgumentation
    • …
    corecore