147 research outputs found

    How to Play in Infinite MDPs (Invited Talk)

    Get PDF
    International audienceMarkov decision processes (MDPs) are a standard model for dynamic systems that exhibit both stochastic and nondeterministic behavior. For MDPs with finite state space it is known that for a wide range of objectives there exist optimal strategies that are memoryless and deterministic. In contrast, if the state space is infinite, optimal strategies may not exist, and optimal or Δ-optimal strategies may require (possibly infinite) memory. In this paper we consider qualitative objectives: reachability, safety, (co-)BĂŒchi, and other parity objectives. We aim at giving an introduction to a collection of techniques that allow for the construction of strategies with little or no memory in countably infinite MDPs

    Non-Zero Sum Games for Reactive Synthesis

    Get PDF
    In this invited contribution, we summarize new solution concepts useful for the synthesis of reactive systems that we have introduced in several recent publications. These solution concepts are developed in the context of non-zero sum games played on graphs. They are part of the contributions obtained in the inVEST project funded by the European Research Council.Comment: LATA'16 invited pape

    Editors' Introduction to [Algorithmic Learning Theory: 21st International Conference, ALT 2010, Canberra, Australia, October 6-8, 2010. Proceedings]

    No full text
    Learning theory is an active research area that incorporates ideas, problems, and techniques from a wide range of disciplines including statistics, artificial intelligence, information theory, pattern recognition, and theoretical computer science. The research reported at the 21st International Conference on Algorithmic Learning Theory (ALT 2010) ranges over areas such as query models, online learning, inductive inference, boosting, kernel methods, complexity and learning, reinforcement learning, unsupervised learning, grammatical inference, and algorithmic forecasting. In this introduction we give an overview of the five invited talks and the regular contributions of ALT 2010

    Strategy Complexity of Reachability in Countable Stochastic 2-Player Games

    Full text link
    We study countably infinite stochastic 2-player games with reachability objectives. Our results provide a complete picture of the memory requirements of Δ\varepsilon-optimal (resp. optimal) strategies. These results depend on the size of the players' action sets and on whether one requires strategies that are uniform (i.e., independent of the start state). Our main result is that Δ\varepsilon-optimal (resp. optimal) Maximizer strategies require infinite memory if Minimizer is allowed infinite action sets. This lower bound holds even under very strong restrictions. Even in the special case of infinitely branching turn-based reachability games, even if all states allow an almost surely winning Maximizer strategy, strategies with a step counter plus finite private memory are still useless. Regarding uniformity, we show that for Maximizer there need not exist positional (i.e., memoryless) uniformly Δ\varepsilon-optimal strategies even in the special case of finite action sets or in finitely branching turn-based games. On the other hand, in games with finite action sets, there always exists a uniformly Δ\varepsilon-optimal Maximizer strategy that uses just one bit of public memory

    Reinforcement Learning with Non-Markovian Rewards

    Full text link
    The standard RL world model is that of a Markov Decision Process (MDP). A basic premise of MDPs is that the rewards depend on the last state and action only. Yet, many real-world rewards are non-Markovian. For example, a reward for bringing coffee only if requested earlier and not yet served, is non-Markovian if the state only records current requests and deliveries. Past work considered the problem of modeling and solving MDPs with non-Markovian rewards (NMR), but we know of no principled approaches for RL with NMR. Here, we address the problem of policy learning from experience with such rewards. We describe and evaluate empirically four combinations of the classical RL algorithm Q-learning and R-max with automata learning algorithms to obtain new RL algorithms for domains with NMR. We also prove that some of these variants converge to an optimal policy in the limit.Comment: To Appear in AAAI 202

    On the connection of probabilistic model checking, planning, and learning for system verification

    Get PDF
    This thesis presents approaches using techniques from the model checking, planning, and learning community to make systems more reliable and perspicuous. First, two heuristic search and dynamic programming algorithms are adapted to be able to check extremal reachability probabilities, expected accumulated rewards, and their bounded versions, on general Markov decision processes (MDPs). Thereby, the problem space originally solvable by these algorithms is enlarged considerably. Correctness and optimality proofs for the adapted algorithms are given, and in a comprehensive case study on established benchmarks it is shown that the implementation, called Modysh, is competitive with state-of-the-art model checkers and even outperforms them on very large state spaces. Second, Deep Statistical Model Checking (DSMC) is introduced, usable for quality assessment and learning pipeline analysis of systems incorporating trained decision-making agents, like neural networks (NNs). The idea of DSMC is to use statistical model checking to assess NNs resolving nondeterminism in systems modeled as MDPs. The versatility of DSMC is exemplified in a number of case studies on Racetrack, an MDP benchmark designed for this purpose, flexibly modeling the autonomous driving challenge. In a comprehensive scalability study it is demonstrated that DSMC is a lightweight technique tackling the complexity of NN analysis in combination with the state space explosion problem.Diese Arbeit prĂ€sentiert AnsĂ€tze, die Techniken aus dem Model Checking, Planning und Learning Bereich verwenden, um Systeme verlĂ€sslicher und klarer verstĂ€ndlich zu machen. Zuerst werden zwei Algorithmen fĂŒr heuristische Suche und dynamisches Programmieren angepasst, um Extremwerte fĂŒr Erreichbarkeitswahrscheinlichkeiten, Erwartungswerte fĂŒr Kosten und beschrĂ€nkte Varianten davon, auf generellen Markov Entscheidungsprozessen (MDPs) zu untersuchen. Damit wird der Problemraum, der ursprĂŒnglich mit diesen Algorithmen gelöst wurde, deutlich erweitert. Korrektheits- und OptimalitĂ€tsbeweise fĂŒr die angepassten Algorithmen werden gegeben und in einer umfassenden Fallstudie wird gezeigt, dass die Implementierung, namens Modysh, konkurrenzfĂ€hig mit den modernsten Model Checkern ist und deren Leistung auf sehr großen ZustandsrĂ€umen sogar ĂŒbertrifft. Als Zweites wird Deep Statistical Model Checking (DSMC) fĂŒr die QualitĂ€tsbewertung und Lernanalyse von Systemen mit integrierten trainierten Entscheidungsgenten, wie z.B. neuronalen Netzen (NN), eingefĂŒhrt. Die Idee von DSMC ist es, statistisches Model Checking zur Bewertung von NNs zu nutzen, die Nichtdeterminismus in Systemen, die als MDPs modelliert sind, auflösen. Die Vielseitigkeit des Ansatzes wird in mehreren Fallbeispielen auf Racetrack gezeigt, einer MDP Benchmark, die zu diesem Zweck entwickelt wurde und die Herausforderung des autonomen Fahrens flexibel modelliert. In einer umfassenden Skalierbarkeitsstudie wird demonstriert, dass DSMC eine leichtgewichtige Technik ist, die die KomplexitĂ€t der NN-Analyse in Kombination mit dem State Space Explosion Problem bewĂ€ltigt

    Foundations of probability-raising causality in Markov decision processes

    Full text link
    This work introduces a novel cause-effect relation in Markov decision processes using the probability-raising principle. Initially, sets of states as causes and effects are considered, which is subsequently extended to regular path properties as effects and then as causes. The paper lays the mathematical foundations and analyzes the algorithmic properties of these cause-effect relations. This includes algorithms for checking cause conditions given an effect and deciding the existence of probability-raising causes. As the definition allows for sub-optimal coverage properties, quality measures for causes inspired by concepts of statistical analysis are studied. These include recall, coverage ratio and f-score. The computational complexity for finding optimal causes with respect to these measures is analyzed.Comment: Submission for Logical Methods in Computer Science (special issue FoSSaCS 2022). arXiv admin note: substantial text overlap with arXiv:2201.0876
    • 

    corecore