8 research outputs found

    Scalable Verification of Markov Decision Processes

    Get PDF
    Markov decision processes (MDP) are useful to model concurrent process optimisation problems, but verifying them with numerical methods is often intractable. Existing approximative approaches do not scale well and are limited to memoryless schedulers. Here we present the basis of scalable verification for MDPSs, using an O(1) memory representation of history-dependent schedulers. We thus facilitate scalable learning techniques and the use of massively parallel verification.Comment: V4: FMDS version, 12 pages, 4 figure

    Smart Sampling for Lightweight Verification of Markov Decision Processes

    Get PDF
    Markov decision processes (MDP) are useful to model optimisation problems in concurrent systems. To verify MDPs with efficient Monte Carlo techniques requires that their nondeterminism be resolved by a scheduler. Recent work has introduced the elements of lightweight techniques to sample directly from scheduler space, but finding optimal schedulers by simple sampling may be inefficient. Here we describe "smart" sampling algorithms that can make substantial improvements in performance.Comment: IEEE conference style, 11 pages, 5 algorithms, 11 figures, 1 tabl

    Certified Reinforcement Learning with Logic Guidance

    Full text link
    This paper proposes the first model-free Reinforcement Learning (RL) framework to synthesise policies for unknown, and continuous-state Markov Decision Processes (MDPs), such that a given linear temporal property is satisfied. We convert the given property into a Limit Deterministic Buchi Automaton (LDBA), namely a finite-state machine expressing the property. Exploiting the structure of the LDBA, we shape a synchronous reward function on-the-fly, so that an RL algorithm can synthesise a policy resulting in traces that probabilistically satisfy the linear temporal property. This probability (certificate) is also calculated in parallel with policy learning when the state space of the MDP is finite: as such, the RL algorithm produces a policy that is certified with respect to the property. Under the assumption of finite state space, theoretical guarantees are provided on the convergence of the RL algorithm to an optimal policy, maximising the above probability. We also show that our method produces ''best available'' control policies when the logical property cannot be satisfied. In the general case of a continuous state space, we propose a neural network architecture for RL and we empirically show that the algorithm finds satisfying policies, if there exist such policies. The performance of the proposed framework is evaluated via a set of numerical examples and benchmarks, where we observe an improvement of one order of magnitude in the number of iterations required for the policy synthesis, compared to existing approaches whenever available.Comment: This article draws from arXiv:1801.08099, arXiv:1809.0782

    Approximate planning and verification for large Markov decision processes

    No full text
    International audienc

    Computer Aided Verification

    Get PDF
    This open access two-volume set LNCS 11561 and 11562 constitutes the refereed proceedings of the 31st International Conference on Computer Aided Verification, CAV 2019, held in New York City, USA, in July 2019. The 52 full papers presented together with 13 tool papers and 2 case studies, were carefully reviewed and selected from 258 submissions. The papers were organized in the following topical sections: Part I: automata and timed systems; security and hyperproperties; synthesis; model checking; cyber-physical systems and machine learning; probabilistic systems, runtime techniques; dynamical, hybrid, and reactive systems; Part II: logics, decision procedures; and solvers; numerical programs; verification; distributed systems and networks; verification and invariants; and concurrency

    Tools and techniques for analysing the impact of information security

    Get PDF
    PhD ThesisThe discipline of information security is employed by organisations to protect the confidentiality, integrity and availability of information, often communicated in the form of information security policies. A policy expresses rules, constraints and procedures to guard against adversarial threats and reduce risk by instigating desired and secure behaviour of those people interacting with information legitimately. To keep aligned with a dynamic threat landscape, evolving business requirements, regulation updates, and new technologies a policy must undergo periodic review and change. Chief Information Security Officers (CISOs) are the main decision makers on information security policies within an organisation. Making informed policy modifications involves analysing and therefore predicting the impact of those changes on the success rate of business processes often expressed as workflows. Security brings an added burden to completing a workflow. Adding a new security constraint may reduce success rate or even eliminate it if a workflow is always forced to terminate early. This can increase the chances of employees bypassing or violating a security policy. Removing an existing security constraint may increase success rate but may may also increase the risk to security. A lack of suitably aimed impact analysis tools and methodologies for CISOs means impact analysis is currently a somewhat manual and ambiguous procedure. Analysis can be overwhelming, time consuming, error prone, and yield unclear results, especially when workflows are complex, have a large workforce, and diverse security requirements. This thesis considers the provision of tools and more formal techniques specific to CISOs to help them analyse the impact modifying a security policy has on the success rate of a workflow. More precisely, these tools and techniques have been designed to efficiently compare the impact between two versions of a security policy applied to the same workflow, one before, the other after a policy modification. This work focuses on two specific types of security impact analysis. The first is quantitative in nature, providing a measure of success rate for a security constrained workflow which must be executed by employees who may be absent at runtime. This work considers quantifying workflow resiliency which indicates a workflow’s expected success rate assuming the availability of employees to be probabilistic. New aspects of quantitative resiliency are introduced in the form of workflow metrics, and risk management techniques to manage workflows that must work with a resiliency below acceptable levels. Defining these risk management techniques has led to exploring the reduction of resiliency computation time and analysing resiliency in workflows with choice. The second area of focus is more qualitative, in terms of facilitating analysis of how people are likely to behave in response to security and how that behaviour can impact the success rate of a workflow at a task level. Large amounts of information from disparate sources exists on human behavioural factors in a security setting which can be aligned with security standards and structured within a single ontology to form a knowledge base. Consultations with two CISOs have been conducted, whose responses have driven the implementation of two new tools, one graphical, the other Web-oriented allowing CISOs and human factors experts to record and incorporate their knowledge directly within an ontology. The ontology can be used by CISOs to assess the potential impact of changes made to a security policy and help devise behavioural controls to manage that impact. The two consulted CISOs have also carried out an evaluation of the Web-oriented tool. vii
    corecore