57 research outputs found

    Probabilistic Bisimulations for PCTL Model Checking of Interval MDPs

    Full text link
    Verification of PCTL properties of MDPs with convex uncertainties has been investigated recently by Puggelli et al. However, model checking algorithms typically suffer from state space explosion. In this paper, we address probabilistic bisimulation to reduce the size of such an MDPs while preserving PCTL properties it satisfies. We discuss different interpretations of uncertainty in the models which are studied in the literature and that result in two different definitions of bisimulations. We give algorithms to compute the quotients of these bisimulations in time polynomial in the size of the model and exponential in the uncertain branching. Finally, we show by a case study that large models in practice can have small branching and that a substantial state space reduction can be achieved by our approach.Comment: In Proceedings SynCoP 2014, arXiv:1403.784

    Multi-objective Robust Strategy Synthesis for Interval Markov Decision Processes

    Full text link
    Interval Markov decision processes (IMDPs) generalise classical MDPs by having interval-valued transition probabilities. They provide a powerful modelling tool for probabilistic systems with an additional variation or uncertainty that prevents the knowledge of the exact transition probabilities. In this paper, we consider the problem of multi-objective robust strategy synthesis for interval MDPs, where the aim is to find a robust strategy that guarantees the satisfaction of multiple properties at the same time in face of the transition probability uncertainty. We first show that this problem is PSPACE-hard. Then, we provide a value iteration-based decision algorithm to approximate the Pareto set of achievable points. We finally demonstrate the practical effectiveness of our proposed approaches by applying them on several case studies using a prototypical tool.Comment: This article is a full version of a paper accepted to the Conference on Quantitative Evaluation of SysTems (QEST) 201

    Decision algorithms for modelling, optimal control and verification of probabilistic systems

    Get PDF
    Markov Decision Processes (MDPs) constitute a mathematical framework for modelling systems featuring both probabilistic and nondeterministic behaviour. They are widely used to solve sequential decision making problems and applied successfully in operations research, arti?cial intelligence, and stochastic control theory, and have been extended conservatively to the model of probabilistic automata in the context of concurrent probabilistic systems. However, when modeling a physical system they suffer from several limitations. One of the most important is the inherent loss of precision that is introduced by measurement errors and discretization artifacts which necessarily happen due to incomplete knowledge about the system behavior. As a result, the true probability distribution for transitions is in most cases an uncertain value, determined by either external parameters or con?dence intervals. Interval Markov decision processes (IMDPs) generalize classical MDPs by having interval-valued transition probabilities. They provide a powerful modelling tool for probabilistic systems with an additional variation or uncertainty that re?ects the absence of precise knowledge concerning transition probabilities. In this dissertation, we focus on decision algorithms for modelling and performance evaluation of such probabilistic systems leveraging techniques from mathematical optimization. From a modelling viewpoint, we address probabilistic bisimulations to reduce the size of the system models while preserving the logical properties they satisfy. We also discuss the key ingredients to construct systems by composing them out of smaller components running in parallel. Furthermore, we introduce a novel stochastic model, Uncertain weighted Markov Decision Processes (UwMDPs), so as to capture quantities like preferences or priorities in a nondeterministic scenario with uncertainties. This model is close to the model of IMDPs but more convenient to work with in the context of bisimulation minimization. From a performance evaluation perspective, we consider the problem of multi-objective robust strategy synthesis for IMDPs, where the aim is to ?nd a robust strategy that guarantees the satisfaction of multiple properties at the same time in face of the transition probability uncertainty. In this respect, we discuss the computational complexity of the problem and present a value iteration-based decision algorithm to approximate the Pareto set of achievable optimal points. Moreover, we consider the problem of computing maximal/minimal reward-bounded reachability probabilities on UwMDPs, for which we present an ef?cient algorithm running in pseudo-polynomial time. We demonstrate the practical effectiveness of our proposed approaches by applying them to a collection of real-world case studies using several prototypical tools.Markov-Entscheidungsprozesse (MEPe) bilden den Rahmen für die Modellierung von Systemen, die sowohl stochastisches als auch nichtdeterministisches Verhalten beinhalten. Diese Modellklasse hat ein breites Anwendungsfeld in der Lösung sequentieller Entscheidungsprobleme und wird erfolgreich in der Operationsforschung, der künstlichen Intelligenz und in der stochastischen Kontrolltheorie eingesetzt. Im Bereich der nebenläu?gen probabilistischen Systeme wurde sie konservativ zu probabilistischen Automaten erweitert. Verwendet man MEPe jedoch zur Modellierung physikalischer Systeme so zeigt es sich, dass sie an einer Reihe von Einschränkungen leiden. Eines der schwerwiegendsten Probleme ist, dass das tatsächliche Verhalten des betrachteten Systems zumeist nicht vollständig bekannt ist. Durch Messfehler und Diskretisierungsartefakte ist ein Verlust an Genauigkeit unvermeidbar. Die tatsächlichen Übergangswahrscheinlichkeitsverteilungen des Systems sind daher in den meisten Fällen nicht exakt bekannt, sondern hängen von äußeren Faktoren ab oder können nur durch Kon?denzintervalle erfasst werden. Intervall Markov-Entscheidungsprozesse (IMEPe) verallgemeinern klassische MEPe dadurch, dass die möglichen Übergangswahrscheinlichkeitsverteilungen durch Intervalle ausgedrückt werden können. IMEPe sind daher ein mächtiges Modellierungswerkzeug für probabilistische Systeme mit unbestimmtem Verhalten, dass sich dadurch ergibt, dass das exakte Verhalten des realen Systems nicht bekannt ist. In dieser Doktorarbeit konzentrieren wir uns auf Entscheidungsverfahren für die Modellierung und die Auswertung der Eigenschaften solcher probabilistischer Systeme indem wir Methoden der mathematischen Optimierung einsetzen. Im Bereich der Modellierung betrachten wir probabilistische Bisimulation um die Größe des Systemmodells zu reduzieren während wir gleichzeitig die logischen Eigenschaften erhalten. Wir betrachten außerdem die Schlüsseltechniken um Modelle aus kleineren Komponenten, die parallel ablaufen, kompositionell zu generieren. Weiterhin führen wir eine neue Art von stochastischen Modellen ein, sogenannte Unsichere Gewichtete Markov-Entscheidungsprozesse (UgMEPe), um Eigenschaften wie Implementierungsentscheidungen und Benutzerprioritäten in einem nichtdeterministischen Szenario ausdrücken zu können. Dieses Modell ähnelt IMEPe, ist aber besser für die Minimierung bezüglich Bisimulation geeignet. Im Bereich der Auswertung von Modelleigenschaften betrachten wir das Problem, Strategien zu generieren, die in der Lage sind den Nichtdeterminismus so aufzulösen, dass mehrere gewünschte Eigenschaften gleichzeitig erfüllt werden können, wobei jede mögliche Auswahl von Wahrscheinlichkeitsverteilungen aus den Übergangsintervallen zu respektieren ist. Wir betrachten die Komplexitätsklasse dieses Problems und diskutieren einen auf Werte-Iteration beruhenden Algorithmus um die Pareto-Menge der erreichbaren optimalen Punkte anzunähern. Weiterhin betrachten wir das Problem, minimale und maximale Erreichbarkeitswahrscheinlichkeiten zu berechnen, wenn wir eine obere Grenze für dieakkumulierten Pfadkosten einhalten müssen. Für dieses Problem diskutieren wir einen ef?zienten Algorithmus mit pseudopolynomieller Zeit. Wir zeigen die Ef?zienz unserer Ansätze in der Praxis, indem wir sie prototypisch implementieren und auf eine Reihe von realistischen Fallstudien anwenden

    Perturbation analysis in verification of discrete-time Markov chains

    Get PDF
    Perturbation analysis in probabilistic verification addresses the robustness and sensitivity problem for verification of stochastic models against qualitative and quantitative properties. We identify two types of perturbation bounds, namely non-asymptotic bounds and asymptotic bounds. Non-asymptotic bounds are exact, pointwise bounds that quantify the upper and lower bounds of the verification result subject to a given perturbation of the model, whereas asymptotic bounds are closed-form bounds that approximate non-asymptotic bounds by assuming that the given perturbation is sufficiently small. We perform perturbation analysis in the setting of Discrete-time Markov Chains. We consider three basic matrix norms to capture the perturbation distance, and focus on the computational aspect. Our main contributions include algorithms and tight complexity bounds for calculating both non-asymptotic bounds and asymptotic bounds with respect to the three perturbation distances. © 2014 Springer-Verlag

    Adversarial Robustness Verification and Attack Synthesis in Stochastic Systems

    Full text link
    Probabilistic model checking is a useful technique for specifying and verifying properties of stochastic systems including randomized protocols and reinforcement learning models. Existing methods rely on the assumed structure and probabilities of certain system transitions. These assumptions may be incorrect, and may even be violated by an adversary who gains control of system components. In this paper, we develop a formal framework for adversarial robustness in systems modeled as discrete time Markov chains (DTMCs). We base our framework on existing methods for verifying probabilistic temporal logic properties and extend it to include deterministic, memoryless policies acting in Markov decision processes (MDPs). Our framework includes a flexible approach for specifying structure-preserving and non structure-preserving adversarial models. We outline a class of threat models under which adversaries can perturb system transitions, constrained by an ε\varepsilon ball around the original transition probabilities. We define three main DTMC adversarial robustness problems: adversarial robustness verification, maximal δ\delta synthesis, and worst case attack synthesis. We present two optimization-based solutions to these three problems, leveraging traditional and parametric probabilistic model checking techniques. We then evaluate our solutions on two stochastic protocols and a collection of Grid World case studies, which model an agent acting in an environment described as an MDP. We find that the parametric solution results in fast computation for small parameter spaces. In the case of less restrictive (stronger) adversaries, the number of parameters increases, and directly computing property satisfaction probabilities is more scalable. We demonstrate the usefulness of our definitions and solutions by comparing system outcomes over various properties, threat models, and case studies.Comment: To Appear, 35th IEEE Computer Security Foundations Symposium (2022

    Safety-aware apprenticeship learning

    Full text link
    It is well acknowledged in the AI community that finding a good reward function for reinforcement learning is extremely challenging. Apprenticeship learning (AL) is a class of “learning from demonstration” techniques where the reward function of a Markov Decision Process (MDP) is unknown to the learning agent and the agent uses inverse reinforcement learning (IRL) methods to recover expert policy from a set of expert demonstrations. However, as the agent learns exclusively from observations, given a constraint on the probability of the agent running into unwanted situations, there is no verification, nor guarantee, for the learnt policy on the satisfaction of the restriction. In this dissertation, we study the problem of how to guide AL to learn a policy that is inherently safe while still meeting its learning objective. By combining formal methods with imitation learning, a Counterexample-Guided Apprenticeship Learning algorithm is proposed. We consider a setting where the unknown reward function is assumed to be a linear combination of a set of state features, and the safety property is specified in Probabilistic Computation Tree Logic (PCTL). By embedding probabilistic model checking inside AL, we propose a novel counterexample-guided approach that can ensure both safety and performance of the learnt policy. This algorithm guarantees that given some formal safety specification defined by probabilistic temporal logic, the learnt policy shall satisfy this specification. We demonstrate the effectiveness of our approach on several challenging AL scenarios where safety is essential

    Parameter-Independent Strategies for pMDPs via POMDPs

    Full text link
    Markov Decision Processes (MDPs) are a popular class of models suitable for solving control decision problems in probabilistic reactive systems. We consider parametric MDPs (pMDPs) that include parameters in some of the transition probabilities to account for stochastic uncertainties of the environment such as noise or input disturbances. We study pMDPs with reachability objectives where the parameter values are unknown and impossible to measure directly during execution, but there is a probability distribution known over the parameter values. We study for the first time computing parameter-independent strategies that are expectation optimal, i.e., optimize the expected reachability probability under the probability distribution over the parameters. We present an encoding of our problem to partially observable MDPs (POMDPs), i.e., a reduction of our problem to computing optimal strategies in POMDPs. We evaluate our method experimentally on several benchmarks: a motivating (repeated) learner model; a series of benchmarks of varying configurations of a robot moving on a grid; and a consensus protocol.Comment: Extended version of a QEST 2018 pape

    Qualitative Reachability for Open Interval Markov Chains

    Get PDF
    Interval Markov chains extend classical Markov chains with the possibility to describe transition probabilities using intervals, rather than exact values. While the standard formulation of interval Markov chains features closed intervals, previous work has considered also open interval Markov chains, in which the intervals can also be open or half-open. In this paper we focus on qualitative reachability problems for open interval Markov chains, which consider whether the optimal (maximum or minimum) probability with which a certain set of states can be reached is equal to 0 or 1. We present polynomial-time algorithms for these problems for both of the standard semantics of interval Markov chains. Our methods do not rely on the closure of open intervals, in contrast to previous approaches for open interval Markov chains, and can characterise situations in which probability 0 or 1 can be attained not exactly but arbitrarily closely.Comment: Full version of a paper published at RP 201
    corecore