199 research outputs found

    Multi-objective Robust Strategy Synthesis for Interval Markov Decision Processes

    Full text link
    Interval Markov decision processes (IMDPs) generalise classical MDPs by having interval-valued transition probabilities. They provide a powerful modelling tool for probabilistic systems with an additional variation or uncertainty that prevents the knowledge of the exact transition probabilities. In this paper, we consider the problem of multi-objective robust strategy synthesis for interval MDPs, where the aim is to find a robust strategy that guarantees the satisfaction of multiple properties at the same time in face of the transition probability uncertainty. We first show that this problem is PSPACE-hard. Then, we provide a value iteration-based decision algorithm to approximate the Pareto set of achievable points. We finally demonstrate the practical effectiveness of our proposed approaches by applying them on several case studies using a prototypical tool.Comment: This article is a full version of a paper accepted to the Conference on Quantitative Evaluation of SysTems (QEST) 201

    Formal Methods for Autonomous Systems

    Full text link
    Formal methods refer to rigorous, mathematical approaches to system development and have played a key role in establishing the correctness of safety-critical systems. The main building blocks of formal methods are models and specifications, which are analogous to behaviors and requirements in system design and give us the means to verify and synthesize system behaviors with formal guarantees. This monograph provides a survey of the current state of the art on applications of formal methods in the autonomous systems domain. We consider correct-by-construction synthesis under various formulations, including closed systems, reactive, and probabilistic settings. Beyond synthesizing systems in known environments, we address the concept of uncertainty and bound the behavior of systems that employ learning using formal methods. Further, we examine the synthesis of systems with monitoring, a mitigation technique for ensuring that once a system deviates from expected behavior, it knows a way of returning to normalcy. We also show how to overcome some limitations of formal methods themselves with learning. We conclude with future directions for formal methods in reinforcement learning, uncertainty, privacy, explainability of formal methods, and regulation and certification

    Percentile Queries in Multi-Dimensional Markov Decision Processes

    Full text link
    Markov decision processes (MDPs) with multi-dimensional weights are useful to analyze systems with multiple objectives that may be conflicting and require the analysis of trade-offs. We study the complexity of percentile queries in such MDPs and give algorithms to synthesize strategies that enforce such constraints. Given a multi-dimensional weighted MDP and a quantitative payoff function ff, thresholds viv_i (one per dimension), and probability thresholds αi\alpha_i, we show how to compute a single strategy to enforce that for all dimensions ii, the probability of outcomes ρ\rho satisfying fi(ρ)vif_i(\rho) \geq v_i is at least αi\alpha_i. We consider classical quantitative payoffs from the literature (sup, inf, lim sup, lim inf, mean-payoff, truncated sum, discounted sum). Our work extends to the quantitative case the multi-objective model checking problem studied by Etessami et al. in unweighted MDPs.Comment: Extended version of CAV 2015 pape

    Verification and control of partially observable probabilistic systems

    Get PDF
    We present automated techniques for the verification and control of partially observable, probabilistic systems for both discrete and dense models of time. For the discrete-time case, we formally model these systems using partially observable Markov decision processes; for dense time, we propose an extension of probabilistic timed automata in which local states are partially visible to an observer or controller. We give probabilistic temporal logics that can express a range of quantitative properties of these models, relating to the probability of an event’s occurrence or the expected value of a reward measure. We then propose techniques to either verify that such a property holds or synthesise a controller for the model which makes it true. Our approach is based on a grid-based abstraction of the uncountable belief space induced by partial observability and, for dense-time models, an integer discretisation of real-time behaviour. The former is necessarily approximate since the underlying problem is undecidable, however we show how both lower and upper bounds on numerical results can be generated. We illustrate the effectiveness of the approach by implementing it in the PRISM model checker and applying it to several case studies from the domains of task and network scheduling, computer security and planning

    Decision algorithms for modelling, optimal control and verification of probabilistic systems

    Get PDF
    Markov Decision Processes (MDPs) constitute a mathematical framework for modelling systems featuring both probabilistic and nondeterministic behaviour. They are widely used to solve sequential decision making problems and applied successfully in operations research, arti?cial intelligence, and stochastic control theory, and have been extended conservatively to the model of probabilistic automata in the context of concurrent probabilistic systems. However, when modeling a physical system they suffer from several limitations. One of the most important is the inherent loss of precision that is introduced by measurement errors and discretization artifacts which necessarily happen due to incomplete knowledge about the system behavior. As a result, the true probability distribution for transitions is in most cases an uncertain value, determined by either external parameters or con?dence intervals. Interval Markov decision processes (IMDPs) generalize classical MDPs by having interval-valued transition probabilities. They provide a powerful modelling tool for probabilistic systems with an additional variation or uncertainty that re?ects the absence of precise knowledge concerning transition probabilities. In this dissertation, we focus on decision algorithms for modelling and performance evaluation of such probabilistic systems leveraging techniques from mathematical optimization. From a modelling viewpoint, we address probabilistic bisimulations to reduce the size of the system models while preserving the logical properties they satisfy. We also discuss the key ingredients to construct systems by composing them out of smaller components running in parallel. Furthermore, we introduce a novel stochastic model, Uncertain weighted Markov Decision Processes (UwMDPs), so as to capture quantities like preferences or priorities in a nondeterministic scenario with uncertainties. This model is close to the model of IMDPs but more convenient to work with in the context of bisimulation minimization. From a performance evaluation perspective, we consider the problem of multi-objective robust strategy synthesis for IMDPs, where the aim is to ?nd a robust strategy that guarantees the satisfaction of multiple properties at the same time in face of the transition probability uncertainty. In this respect, we discuss the computational complexity of the problem and present a value iteration-based decision algorithm to approximate the Pareto set of achievable optimal points. Moreover, we consider the problem of computing maximal/minimal reward-bounded reachability probabilities on UwMDPs, for which we present an ef?cient algorithm running in pseudo-polynomial time. We demonstrate the practical effectiveness of our proposed approaches by applying them to a collection of real-world case studies using several prototypical tools.Markov-Entscheidungsprozesse (MEPe) bilden den Rahmen für die Modellierung von Systemen, die sowohl stochastisches als auch nichtdeterministisches Verhalten beinhalten. Diese Modellklasse hat ein breites Anwendungsfeld in der Lösung sequentieller Entscheidungsprobleme und wird erfolgreich in der Operationsforschung, der künstlichen Intelligenz und in der stochastischen Kontrolltheorie eingesetzt. Im Bereich der nebenläu?gen probabilistischen Systeme wurde sie konservativ zu probabilistischen Automaten erweitert. Verwendet man MEPe jedoch zur Modellierung physikalischer Systeme so zeigt es sich, dass sie an einer Reihe von Einschränkungen leiden. Eines der schwerwiegendsten Probleme ist, dass das tatsächliche Verhalten des betrachteten Systems zumeist nicht vollständig bekannt ist. Durch Messfehler und Diskretisierungsartefakte ist ein Verlust an Genauigkeit unvermeidbar. Die tatsächlichen Übergangswahrscheinlichkeitsverteilungen des Systems sind daher in den meisten Fällen nicht exakt bekannt, sondern hängen von äußeren Faktoren ab oder können nur durch Kon?denzintervalle erfasst werden. Intervall Markov-Entscheidungsprozesse (IMEPe) verallgemeinern klassische MEPe dadurch, dass die möglichen Übergangswahrscheinlichkeitsverteilungen durch Intervalle ausgedrückt werden können. IMEPe sind daher ein mächtiges Modellierungswerkzeug für probabilistische Systeme mit unbestimmtem Verhalten, dass sich dadurch ergibt, dass das exakte Verhalten des realen Systems nicht bekannt ist. In dieser Doktorarbeit konzentrieren wir uns auf Entscheidungsverfahren für die Modellierung und die Auswertung der Eigenschaften solcher probabilistischer Systeme indem wir Methoden der mathematischen Optimierung einsetzen. Im Bereich der Modellierung betrachten wir probabilistische Bisimulation um die Größe des Systemmodells zu reduzieren während wir gleichzeitig die logischen Eigenschaften erhalten. Wir betrachten außerdem die Schlüsseltechniken um Modelle aus kleineren Komponenten, die parallel ablaufen, kompositionell zu generieren. Weiterhin führen wir eine neue Art von stochastischen Modellen ein, sogenannte Unsichere Gewichtete Markov-Entscheidungsprozesse (UgMEPe), um Eigenschaften wie Implementierungsentscheidungen und Benutzerprioritäten in einem nichtdeterministischen Szenario ausdrücken zu können. Dieses Modell ähnelt IMEPe, ist aber besser für die Minimierung bezüglich Bisimulation geeignet. Im Bereich der Auswertung von Modelleigenschaften betrachten wir das Problem, Strategien zu generieren, die in der Lage sind den Nichtdeterminismus so aufzulösen, dass mehrere gewünschte Eigenschaften gleichzeitig erfüllt werden können, wobei jede mögliche Auswahl von Wahrscheinlichkeitsverteilungen aus den Übergangsintervallen zu respektieren ist. Wir betrachten die Komplexitätsklasse dieses Problems und diskutieren einen auf Werte-Iteration beruhenden Algorithmus um die Pareto-Menge der erreichbaren optimalen Punkte anzunähern. Weiterhin betrachten wir das Problem, minimale und maximale Erreichbarkeitswahrscheinlichkeiten zu berechnen, wenn wir eine obere Grenze für dieakkumulierten Pfadkosten einhalten müssen. Für dieses Problem diskutieren wir einen ef?zienten Algorithmus mit pseudopolynomieller Zeit. Wir zeigen die Ef?zienz unserer Ansätze in der Praxis, indem wir sie prototypisch implementieren und auf eine Reihe von realistischen Fallstudien anwenden

    A Risk-Averse Preview-based QQ-Learning Algorithm: Application to Highway Driving of Autonomous Vehicles

    Full text link
    A risk-averse preview-based QQ-learning planner is presented for navigation of autonomous vehicles. To this end, the multi-lane road ahead of a vehicle is represented by a finite-state non-stationary Markov decision process (MDP). A risk assessment unit module is then presented that leverages the preview information provided by sensors along with a stochastic reachability module to assign reward values to the MDP states and update them as scenarios develop. A sampling-based risk-averse preview-based QQ-learning algorithm is finally developed that generates samples using the preview information and reward function to learn risk-averse optimal planning strategies without actual interaction with the environment. The risk factor is imposed on the objective function to avoid fluctuation of the QQ values, which can jeopardize the vehicle's safety and/or performance. The overall hybrid automaton model of the system is leveraged to develop a feasibility check unit module that detects unfeasible plans and enables the planner system to proactively react to the changes of the environment. Theoretical results are provided to bound the number of samples required to guarantee ϵ\epsilon-optimal planning with a high probability. Finally, to verify the efficiency of the presented algorithm, its implementation on highway driving of an autonomous vehicle in a varying traffic density is considered
    corecore