14 research outputs found
Formal methods with a touch of magic
Machine learning and formal methods have complimentary benefits and drawbacks. In this work, we address the controller-design problem with a combination of techniques from both fields. The use of black-box neural networks in deep reinforcement learning (deep RL) poses a challenge for such a combination. Instead of reasoning formally about the output of deep RL, which we call the wizard, we extract from it a decision-tree based model, which we refer to as the magic book. Using the extracted model as an intermediary, we are able to handle problems that are infeasible for either deep RL or formal methods by themselves. First, we suggest, for the first time, a synthesis procedure that is based on a magic book. We synthesize a stand-alone correct-by-design controller that enjoys the favorable performance of RL. Second, we incorporate a magic book in a bounded model checking (BMC) procedure. BMC allows us to find numerous traces of the plant under the control of the wizard, which a user can use to increase the trustworthiness of the wizard and direct further training
MSVIPER: Improved Policy Distillation for Reinforcement-Learning-Based Robot Navigation
We present Multiple Scenario Verifiable Reinforcement Learning via Policy
Extraction (MSVIPER), a new method for policy distillation to decision trees
for improved robot navigation. MSVIPER learns an "expert" policy using any
Reinforcement Learning (RL) technique involving learning a state-action mapping
and then uses imitation learning to learn a decision-tree policy from it. We
demonstrate that MSVIPER results in efficient decision trees and can accurately
mimic the behavior of the expert policy. Moreover, we present efficient policy
distillation and tree-modification techniques that take advantage of the
decision tree structure to allow improvements to a policy without retraining.
We use our approach to improve the performance of RL-based robot navigation
algorithms for indoor and outdoor scenes. We demonstrate the benefits in terms
of reduced freezing and oscillation behaviors (by up to 95\% reduction) for
mobile robots navigating among dynamic obstacles and reduced vibrations and
oscillation (by up to 17\%) for outdoor robot navigation on complex, uneven
terrains.Comment: 6 pages main paper, 2 pages of references, 5 page appendix (13 pages
total) 5 tables, 9 algorithms, 4 figure
Adversarial Robustness Verification and Attack Synthesis in Stochastic Systems
Probabilistic model checking is a useful technique for specifying and
verifying properties of stochastic systems including randomized protocols and
reinforcement learning models. Existing methods rely on the assumed structure
and probabilities of certain system transitions. These assumptions may be
incorrect, and may even be violated by an adversary who gains control of system
components.
In this paper, we develop a formal framework for adversarial robustness in
systems modeled as discrete time Markov chains (DTMCs). We base our framework
on existing methods for verifying probabilistic temporal logic properties and
extend it to include deterministic, memoryless policies acting in Markov
decision processes (MDPs). Our framework includes a flexible approach for
specifying structure-preserving and non structure-preserving adversarial
models. We outline a class of threat models under which adversaries can perturb
system transitions, constrained by an ball around the original
transition probabilities.
We define three main DTMC adversarial robustness problems: adversarial
robustness verification, maximal synthesis, and worst case attack
synthesis. We present two optimization-based solutions to these three problems,
leveraging traditional and parametric probabilistic model checking techniques.
We then evaluate our solutions on two stochastic protocols and a collection of
Grid World case studies, which model an agent acting in an environment
described as an MDP. We find that the parametric solution results in fast
computation for small parameter spaces. In the case of less restrictive
(stronger) adversaries, the number of parameters increases, and directly
computing property satisfaction probabilities is more scalable. We demonstrate
the usefulness of our definitions and solutions by comparing system outcomes
over various properties, threat models, and case studies.Comment: To Appear, 35th IEEE Computer Security Foundations Symposium (2022
Interpreting Deep Learning-Based Networking Systems
While many deep learning (DL)-based networking systems have demonstrated
superior performance, the underlying Deep Neural Networks (DNNs) remain
blackboxes and stay uninterpretable for network operators. The lack of
interpretability makes DL-based networking systems prohibitive to deploy in
practice. In this paper, we propose Metis, a framework that provides
interpretability for two general categories of networking problems spanning
local and global control. Accordingly, Metis introduces two different
interpretation methods based on decision tree and hypergraph, where it converts
DNN policies to interpretable rule-based controllers and highlight critical
components based on analysis over hypergraph. We evaluate Metis over several
state-of-the-art DL-based networking systems and show that Metis provides
human-readable interpretations while preserving nearly no degradation in
performance. We further present four concrete use cases of Metis, showcasing
how Metis helps network operators to design, debug, deploy, and ad-hoc adjust
DL-based networking systems.Comment: To appear at ACM SIGCOMM 202
На пути к нейросетевой маршрутизации с верифицированными границами эффективности
When data-driven algorithms, especially the ones based on deep neural networks (DNNs), replace classical ones, their superior performance often comes with difficulty in their analysis. On the way to compensate for this drawback, formal verification techniques, which can provide reliable guarantees on program behavior, were developed for DNNs. These techniques, however, usually consider DNNs alone, excluding real-world environments in which they operate, and the applicability of techniques that do account for such environments is often limited. In this work, we consider the problem of formally verifying a neural controller for the routing problem in a conveyor network. Unlike in known problem statements, our DNNs are executed in a distributed context, and the performance of the routing algorithm, which we measure as the mean delivery time, depends on multiple executions of these DNNs. Under several assumptions, we reduce the problem to a number of DNN output reachability problems, which can be solved with existing tools. Our experiments indicate that sound-and-complete formal verification in such cases is feasible, although it is notably slower than the gradient-based search of adversarial examples.The paper is structured as follows. Section 1 introduces basic concepts. Then, Section 2 introduces the routing problem and DQN-Routing, the DNN-based algorithm that solves it. Section 3 proposes the contribution of this paper: a novel sound and complete approach to formally check an upper bound on the mean delivery time of DNN-based routing. This approach is experimentally evaluated in Section 4. The paper is concluded with some discussion of the results and outline of possible future work.Когда алгоритмы на основе данных, особенно основанные на глубоких нейронных сетях (ГНС), заменяют классические, их более высокая производительность часто сопряжена с трудностями при анализе. Чтобы компенсировать этот недостаток, для ГНС были разработаны методы формальной верификации, которые могут предоставить надежные гарантии поведения программы. Эти методы, однако, обычно рассматривают только саму ГНС, исключая среду, в которой она работает, и применимость методов, учитывающих такие среды, часто ограничена. В данной работе рассматривается задача формальной верификации нейросетевого контроллера для задачи маршрутизации в конвейерной сети. В отличие от известных постановок задачи, рассматриваемые ГНС выполняются в распределенной среде, и производительность алгоритма маршрутизации, которая измеряется как среднее время доставки, зависит от многократного выполнения этих ГНС. При некоторых предположениях, проблема верификации сводится к ряду проблем достижимости выходов ГНС, которые можно решить с помощью существующих программных средств. Эксперименты показывают, что в таких случаях возможна строгая и полная формальная верификация, хотя она заметно медленнее, чем градиентный поиск состязательных примеров.Статья построена следующим образом. Раздел 1 вводит основные понятия. Затем в Разделе 2 представлена проблема маршрутизации и алгоритм DQN-маршрутизации на основе ГНС, который ее решает. В Разделе 3 описывается вклад данной статьи: новый надежный и полный подход к формальной проверке верхней границы среднего времени доставки маршрутизации на основе ГНС. Этот подход экспериментально оценивается в Разделе 4. Статья завершается обсуждением результатов и описанием возможной будущей работы
Discounted-Sum Automata with Multiple Discount Factors
Discounting the influence of future events is a key paradigm in economics and
it is widely used in computer-science models, such as games, Markov decision
processes (MDPs), reinforcement learning, and automata. While a single game or
MDP may allow for several different discount factors, discounted-sum automata
(NDAs) were only studied with respect to a single discount factor. For every
integer , as opposed to every , the class of NDAs with discount factor
(-NDAs) has good computational properties: it is closed
under determinization and under the algebraic operations min, max, addition,
and subtraction, and there are algorithms for its basic decision problems, such
as automata equivalence and containment.
We define and analyze discounted-sum automata in which each transition can
have a different integral discount factor (integral NMDAs). We show that
integral NMDAs with an arbitrary choice of discount factors are not closed
under determinization and under algebraic operations and that their containment
problem is undecidable. We then define and analyze a restricted class of
integral NMDAs, which we call tidy NMDAs, in which the choice of discount
factors depends on the prefix of the word read so far. Some of their special
cases are NMDAs that correlate discount factors to actions (alphabet letters)
or to the elapsed time. We show that for every function that defines
the choice of discount factors, the class of -NMDAs enjoys all of the
above good properties of integral NDAs, as well as the same complexity of the
required decision problems. Tidy NMDAs are also as expressive as deterministic
integral NMDAs with an arbitrary choice of discount factors.
All of our results hold for both automata on finite words and automata on
infinite words.Comment: arXiv admin note: text overlap with arXiv:2301.0408
Proceedings of the 21st Conference on Formal Methods in Computer-Aided Design – FMCAD 2021
The Conference on Formal Methods in Computer-Aided Design (FMCAD) is an annual conference on the theory and applications of formal methods in hardware and system verification. FMCAD provides a leading forum to researchers in academia and industry for presenting and discussing groundbreaking methods, technologies, theoretical results, and tools for reasoning formally about computing systems. FMCAD covers formal aspects of computer-aided system design including verification, specification, synthesis, and testing
Formal verification of deep reinforcement learning agents
Deep reinforcement learning has been successfully applied to many control tasks, but the application of such controllers in safety-critical scenarios has been limited due to safety concerns. Rigorous testing of these controllers is challenging, particularly when they operate in uncertain environments. In this thesis we develop novel verification techniques to give the user stronger guarantees over the performance of the trained agents that they would be able to obtain by testing, under different degrees and sources of uncertainty.
In particular, we tackle three different sources of uncertainty to the agent and offer different algorithms to provide strong guarantees to the user. The first one is input noise: sensors in the real world always provide imperfect data. The second source of uncertainty comes from the actuators: once an agent decides to take a specific action, faulty actuators and or hardware problems could still prevent the agent from acting upon the decisions given by the controller. The last source of uncertainty is the policy: the set of decisions the controller takes when operating in the environment. Agents may act probabilistically for a number of reasons, such as dealing with adversaries in a competitive environment or addressing partial observability of the environment.
In this thesis, we develop formal models of controllers executing under uncertainty, and propose new verification techniques based on abstract interpretation for their analysis. We cover different horizon lengths, i.e., the number of steps into the future that we analyse, and present methods for both finite-horizon and infinite-horizon verification. We perform both probabilistic and non-probabilistic analysis of the models constructed, depending on the methodology adopted. We implement and evaluate our methods on controllers trained for several benchmark control problems