14 research outputs found

    Formal methods with a touch of magic

    Get PDF
    Machine learning and formal methods have complimentary benefits and drawbacks. In this work, we address the controller-design problem with a combination of techniques from both fields. The use of black-box neural networks in deep reinforcement learning (deep RL) poses a challenge for such a combination. Instead of reasoning formally about the output of deep RL, which we call the wizard, we extract from it a decision-tree based model, which we refer to as the magic book. Using the extracted model as an intermediary, we are able to handle problems that are infeasible for either deep RL or formal methods by themselves. First, we suggest, for the first time, a synthesis procedure that is based on a magic book. We synthesize a stand-alone correct-by-design controller that enjoys the favorable performance of RL. Second, we incorporate a magic book in a bounded model checking (BMC) procedure. BMC allows us to find numerous traces of the plant under the control of the wizard, which a user can use to increase the trustworthiness of the wizard and direct further training

    MSVIPER: Improved Policy Distillation for Reinforcement-Learning-Based Robot Navigation

    Full text link
    We present Multiple Scenario Verifiable Reinforcement Learning via Policy Extraction (MSVIPER), a new method for policy distillation to decision trees for improved robot navigation. MSVIPER learns an "expert" policy using any Reinforcement Learning (RL) technique involving learning a state-action mapping and then uses imitation learning to learn a decision-tree policy from it. We demonstrate that MSVIPER results in efficient decision trees and can accurately mimic the behavior of the expert policy. Moreover, we present efficient policy distillation and tree-modification techniques that take advantage of the decision tree structure to allow improvements to a policy without retraining. We use our approach to improve the performance of RL-based robot navigation algorithms for indoor and outdoor scenes. We demonstrate the benefits in terms of reduced freezing and oscillation behaviors (by up to 95\% reduction) for mobile robots navigating among dynamic obstacles and reduced vibrations and oscillation (by up to 17\%) for outdoor robot navigation on complex, uneven terrains.Comment: 6 pages main paper, 2 pages of references, 5 page appendix (13 pages total) 5 tables, 9 algorithms, 4 figure

    Adversarial Robustness Verification and Attack Synthesis in Stochastic Systems

    Full text link
    Probabilistic model checking is a useful technique for specifying and verifying properties of stochastic systems including randomized protocols and reinforcement learning models. Existing methods rely on the assumed structure and probabilities of certain system transitions. These assumptions may be incorrect, and may even be violated by an adversary who gains control of system components. In this paper, we develop a formal framework for adversarial robustness in systems modeled as discrete time Markov chains (DTMCs). We base our framework on existing methods for verifying probabilistic temporal logic properties and extend it to include deterministic, memoryless policies acting in Markov decision processes (MDPs). Our framework includes a flexible approach for specifying structure-preserving and non structure-preserving adversarial models. We outline a class of threat models under which adversaries can perturb system transitions, constrained by an ε\varepsilon ball around the original transition probabilities. We define three main DTMC adversarial robustness problems: adversarial robustness verification, maximal δ\delta synthesis, and worst case attack synthesis. We present two optimization-based solutions to these three problems, leveraging traditional and parametric probabilistic model checking techniques. We then evaluate our solutions on two stochastic protocols and a collection of Grid World case studies, which model an agent acting in an environment described as an MDP. We find that the parametric solution results in fast computation for small parameter spaces. In the case of less restrictive (stronger) adversaries, the number of parameters increases, and directly computing property satisfaction probabilities is more scalable. We demonstrate the usefulness of our definitions and solutions by comparing system outcomes over various properties, threat models, and case studies.Comment: To Appear, 35th IEEE Computer Security Foundations Symposium (2022

    Interpreting Deep Learning-Based Networking Systems

    Full text link
    While many deep learning (DL)-based networking systems have demonstrated superior performance, the underlying Deep Neural Networks (DNNs) remain blackboxes and stay uninterpretable for network operators. The lack of interpretability makes DL-based networking systems prohibitive to deploy in practice. In this paper, we propose Metis, a framework that provides interpretability for two general categories of networking problems spanning local and global control. Accordingly, Metis introduces two different interpretation methods based on decision tree and hypergraph, where it converts DNN policies to interpretable rule-based controllers and highlight critical components based on analysis over hypergraph. We evaluate Metis over several state-of-the-art DL-based networking systems and show that Metis provides human-readable interpretations while preserving nearly no degradation in performance. We further present four concrete use cases of Metis, showcasing how Metis helps network operators to design, debug, deploy, and ad-hoc adjust DL-based networking systems.Comment: To appear at ACM SIGCOMM 202

    На пути к нейросетевой маршрутизации с верифицированными границами эффективности

    Get PDF
    When data-driven algorithms, especially the ones based on deep neural networks (DNNs), replace classical ones, their superior performance often comes with difficulty in their analysis. On the way to compensate for this drawback, formal verification techniques, which can provide reliable guarantees on program behavior, were developed for DNNs. These techniques, however, usually consider DNNs alone, excluding real-world environments in which they operate, and the applicability of techniques that do account for such environments is often limited. In this work, we consider the problem of formally verifying a neural controller for the routing problem in a conveyor network. Unlike in known problem statements, our DNNs are executed in a distributed context, and the performance of the routing algorithm, which we measure as the mean delivery time, depends on multiple executions of these DNNs. Under several assumptions, we reduce the problem to a number of DNN output reachability problems, which can be solved with existing tools. Our experiments indicate that sound-and-complete formal verification in such cases is feasible, although it is notably slower than the gradient-based search of adversarial examples.The paper is structured as follows. Section 1 introduces basic concepts. Then, Section 2 introduces the routing problem and DQN-Routing, the DNN-based algorithm that solves it. Section 3 proposes the contribution of this paper: a novel sound and complete approach to formally check an upper bound on the mean delivery time of DNN-based routing. This approach is experimentally evaluated in Section 4. The paper is concluded with some discussion of the results and outline of possible future work.Когда алгоритмы на основе данных, особенно основанные на глубоких нейронных сетях (ГНС), заменяют классические, их более высокая производительность часто сопряжена с трудностями при анализе. Чтобы компенсировать этот недостаток, для ГНС были разработаны методы формальной верификации, которые могут предоставить надежные гарантии поведения программы. Эти методы, однако, обычно рассматривают только саму ГНС, исключая среду, в которой она работает, и применимость методов, учитывающих такие среды, часто ограничена. В данной работе рассматривается задача формальной верификации нейросетевого контроллера для задачи маршрутизации в конвейерной сети. В отличие от известных постановок задачи, рассматриваемые ГНС выполняются в распределенной среде, и производительность алгоритма маршрутизации, которая измеряется как среднее время доставки, зависит от многократного выполнения этих ГНС. При некоторых предположениях, проблема верификации сводится к ряду проблем достижимости выходов ГНС, которые можно решить с помощью существующих программных средств. Эксперименты показывают, что в таких случаях возможна строгая и полная формальная верификация, хотя она заметно медленнее, чем градиентный поиск состязательных примеров.Статья построена следующим образом. Раздел 1 вводит основные понятия. Затем в Разделе 2 представлена проблема маршрутизации и алгоритм DQN-маршрутизации на основе ГНС, который ее решает. В Разделе 3 описывается вклад данной статьи: новый надежный и полный подход к формальной проверке верхней границы среднего времени доставки маршрутизации на основе ГНС. Этот подход экспериментально оценивается в Разделе 4. Статья завершается обсуждением результатов и описанием возможной будущей работы

    Discounted-Sum Automata with Multiple Discount Factors

    Full text link
    Discounting the influence of future events is a key paradigm in economics and it is widely used in computer-science models, such as games, Markov decision processes (MDPs), reinforcement learning, and automata. While a single game or MDP may allow for several different discount factors, discounted-sum automata (NDAs) were only studied with respect to a single discount factor. For every integer λN{0,1}\lambda\in\mathbb{N}\setminus\{0,1\}, as opposed to every λQN\lambda\in \mathbb{Q}\setminus\mathbb{N}, the class of NDAs with discount factor λ\lambda (λ\lambda-NDAs) has good computational properties: it is closed under determinization and under the algebraic operations min, max, addition, and subtraction, and there are algorithms for its basic decision problems, such as automata equivalence and containment. We define and analyze discounted-sum automata in which each transition can have a different integral discount factor (integral NMDAs). We show that integral NMDAs with an arbitrary choice of discount factors are not closed under determinization and under algebraic operations and that their containment problem is undecidable. We then define and analyze a restricted class of integral NMDAs, which we call tidy NMDAs, in which the choice of discount factors depends on the prefix of the word read so far. Some of their special cases are NMDAs that correlate discount factors to actions (alphabet letters) or to the elapsed time. We show that for every function θ\theta that defines the choice of discount factors, the class of θ\theta-NMDAs enjoys all of the above good properties of integral NDAs, as well as the same complexity of the required decision problems. Tidy NMDAs are also as expressive as deterministic integral NMDAs with an arbitrary choice of discount factors. All of our results hold for both automata on finite words and automata on infinite words.Comment: arXiv admin note: text overlap with arXiv:2301.0408

    Proceedings of the 21st Conference on Formal Methods in Computer-Aided Design – FMCAD 2021

    Get PDF
    The Conference on Formal Methods in Computer-Aided Design (FMCAD) is an annual conference on the theory and applications of formal methods in hardware and system verification. FMCAD provides a leading forum to researchers in academia and industry for presenting and discussing groundbreaking methods, technologies, theoretical results, and tools for reasoning formally about computing systems. FMCAD covers formal aspects of computer-aided system design including verification, specification, synthesis, and testing

    Formal verification of deep reinforcement learning agents

    Get PDF
    Deep reinforcement learning has been successfully applied to many control tasks, but the application of such controllers in safety-critical scenarios has been limited due to safety concerns. Rigorous testing of these controllers is challenging, particularly when they operate in uncertain environments. In this thesis we develop novel verification techniques to give the user stronger guarantees over the performance of the trained agents that they would be able to obtain by testing, under different degrees and sources of uncertainty. In particular, we tackle three different sources of uncertainty to the agent and offer different algorithms to provide strong guarantees to the user. The first one is input noise: sensors in the real world always provide imperfect data. The second source of uncertainty comes from the actuators: once an agent decides to take a specific action, faulty actuators and or hardware problems could still prevent the agent from acting upon the decisions given by the controller. The last source of uncertainty is the policy: the set of decisions the controller takes when operating in the environment. Agents may act probabilistically for a number of reasons, such as dealing with adversaries in a competitive environment or addressing partial observability of the environment. In this thesis, we develop formal models of controllers executing under uncertainty, and propose new verification techniques based on abstract interpretation for their analysis. We cover different horizon lengths, i.e., the number of steps into the future that we analyse, and present methods for both finite-horizon and infinite-horizon verification. We perform both probabilistic and non-probabilistic analysis of the models constructed, depending on the methodology adopted. We implement and evaluate our methods on controllers trained for several benchmark control problems
    corecore