39 research outputs found

    Targeted Adversarial Attacks on Deep Reinforcement Learning Policies via Model Checking

    Full text link
    Deep Reinforcement Learning (RL) agents are susceptible to adversarial noise in their observations that can mislead their policies and decrease their performance. However, an adversary may be interested not only in decreasing the reward, but also in modifying specific temporal logic properties of the policy. This paper presents a metric that measures the exact impact of adversarial attacks against such properties. We use this metric to craft optimal adversarial attacks. Furthermore, we introduce a model checking method that allows us to verify the robustness of RL policies against adversarial attacks. Our empirical analysis confirms (1) the quality of our metric to craft adversarial attacks against temporal logic properties, and (2) that we are able to concisely assess a system's robustness against attacks.Comment: ICAART 2023 Paper (Technical Report

    Mobile agent path planning under uncertain environment using reinforcement learning and probabilistic model checking

    Get PDF
    The major challenge in mobile agent path planning, within an uncertain environment, is effectively determining an optimal control model to discover the target location as quickly as possible and evaluating the control system's reliability. To address this challenge, we introduce a learning-verification integrated mobile agent path planning method to achieve both the effectiveness and the reliability. More specifically, we first propose a modified Q-learning algorithm (a popular reinforcement learning algorithm), called Q EA−learning algorithm, to find the best Q-table in the environment. We then determine the location transition probability matrix, and establish a probability model using the assumption that the agent selects a location with a higher Q-value. Secondly, the learnt behaviour of the mobile agent based on Q EA−learning algorithm, is formalized as a Discrete-time Markov Chain (DTMC) model. Thirdly, the required reliability requirements of the mobile agent control system are specified using Probabilistic Computation Tree Logic (PCTL). In addition, the DTMC model and the specified properties are taken as the input of the Probabilistic Model Checker PRISM for automatic verification. This is preformed to evaluate and verify the control system's reliability. Finally, a case study of a mobile agent walking in a grids map is used to illustrate the proposed learning algorithm. Here we have a special focus on the modelling approach demonstrating how PRISM can be used to analyse and evaluate the reliability of the mobile agent control system learnt via the proposed algorithm. The results show that the path identified using the proposed integrated method yields the largest expected reward.</p

    Evolutionary-Guided Synthesis of Verified Pareto-Optimal MDP Policies

    Get PDF
    We present a new approach for synthesising Pareto- optimal Markov decision process (MDP) policies that satisfy complex combinations of quality-of-service (QoS) software requirements. These policies correspond to optimal designs or configurations of software systems, and are obtained by translating MDP models of these systems into parametric Markov chains, and using multi-objective genetic algorithms to synthesise Pareto-optimal parameter values that define the required MDP policies. We use case studies from the service-based systems and robotic control software domains to show that our MDP policy synthesis approach can handle a wide range of QoS requirement combinations unsupported by current probabilistic model checkers. Moreover, for requirement combinations supported by these model checkers, our approach generates better Pareto-optimal policy sets according to established quality metrics

    Robust Control for Dynamical Systems With Non-Gaussian Noise via Formal Abstractions

    Full text link
    Controllers for dynamical systems that operate in safety-critical settings must account for stochastic disturbances. Such disturbances are often modeled as process noise in a dynamical system, and common assumptions are that the underlying distributions are known and/or Gaussian. In practice, however, these assumptions may be unrealistic and can lead to poor approximations of the true noise distribution. We present a novel controller synthesis method that does not rely on any explicit representation of the noise distributions. In particular, we address the problem of computing a controller that provides probabilistic guarantees on safely reaching a target, while also avoiding unsafe regions of the state space. First, we abstract the continuous control system into a finite-state model that captures noise by probabilistic transitions between discrete states. As a key contribution, we adapt tools from the scenario approach to compute probably approximately correct (PAC) bounds on these transition probabilities, based on a finite number of samples of the noise. We capture these bounds in the transition probability intervals of a so-called interval Markov decision process (iMDP). This iMDP is, with a user-specified confidence probability, robust against uncertainty in the transition probabilities, and the tightness of the probability intervals can be controlled through the number of samples. We use state-of-the-art verification techniques to provide guarantees on the iMDP and compute a controller for which these guarantees carry over to the original control system. In addition, we develop a tailored computational scheme that reduces the complexity of the synthesis of these guarantees on the iMDP. Benchmarks on realistic control systems show the practical applicability of our method, even when the iMDP has hundreds of millions of transitions.Comment: To appear in the Journal of Artificial Intelligence Research (JAIR). arXiv admin note: text overlap with arXiv:2110.1266

    Formal methods paradigms for estimation and machine learning in dynamical systems

    Get PDF
    Formal methods are widely used in engineering to determine whether a system exhibits a certain property (verification) or to design controllers that are guaranteed to drive the system to achieve a certain property (synthesis). Most existing techniques require a large amount of accurate information about the system in order to be successful. The methods presented in this work can operate with significantly less prior information. In the domain of formal synthesis for robotics, the assumptions of perfect sensing and perfect knowledge of system dynamics are unrealistic. To address this issue, we present control algorithms that use active estimation and reinforcement learning to mitigate the effects of uncertainty. In the domain of cyber-physical system analysis, we relax the assumption that the system model is known and identify system properties automatically from execution data. First, we address the problem of planning the path of a robot under temporal logic constraints (e.g. "avoid obstacles and periodically visit a recharging station") while simultaneously minimizing the uncertainty about the state of an unknown feature of the environment (e.g. locations of fires after a natural disaster). We present synthesis algorithms and evaluate them via simulation and experiments with aerial robots. Second, we develop a new specification language for tasks that require gathering information about and interacting with a partially observable environment, e.g. "Maintain localization error below a certain level while also avoiding obstacles.'' Third, we consider learning temporal logic properties of a dynamical system from a finite set of system outputs. For example, given maritime surveillance data we wish to find the specification that corresponds only to those vessels that are deemed law-abiding. Algorithms for performing off-line supervised and unsupervised learning and on-line supervised learning are presented. Finally, we consider the case in which we want to steer a system with unknown dynamics to satisfy a given temporal logic specification. We present a novel reinforcement learning paradigm to solve this problem. Our procedure gives "partial credit'' for executions that almost satisfy the specification, which can lead to faster convergence rates and produce better solutions when the specification is not satisfiable

    Synthesis of Probabilistic Models for Quality-of-Service Software Engineering

    Get PDF
    An increasingly used method for the engineering of software systems with strict quality-of-service (QoS) requirements involves the synthesis and verification of probabilistic models for many alternative architectures and instantiations of system parameters. Using manual trial-and-error or simple heuristics for this task often produces suboptimal models, while the exhaustive synthesis of all possible models is typically intractable. The EvoChecker search-based software engineering approach presented in our paper addresses these limitations by employing evolutionary algorithms to automate the model synthesis process and to significantly improve its outcome. EvoChecker can be used to synthesise the Pareto-optimal set of probabilistic models associated with the QoS requirements of a system under design, and to support the selection of a suitable system architecture and configuration. EvoChecker can also be used at runtime, to drive the efficient reconfiguration of a self-adaptive software system. We evaluate EvoChecker on several variants of three systems from different application domains, and show its effectiveness and applicability
    corecore