Search CORE

83 research outputs found

Learning-Based Mean-Payoff Optimization in an Unknown MDP under Omega-Regular Constraints

Author
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 29th International Conference on Concurrency Theory (CONCUR 2018)
Publication date: 01/01/2018
Field of study

We formalize the problem of maximizing the mean-payoff value with high probability while satisfying a parity objective in a Markov decision process (MDP) with unknown probabilistic transition function and unknown reward function. Assuming the support of the unknown transition function and a lower bound on the minimal transition probability are known in advance, we show that in MDPs consisting of a single end component, two combinations of guarantees on the parity and mean-payoff objectives can be achieved depending on how much memory one is willing to use. (i) For all epsilon and gamma we can construct an online-learning finite-memory strategy that almost-surely satisfies the parity objective and which achieves an epsilon-optimal mean payoff with probability at least 1 - gamma. (ii) Alternatively, for all epsilon and gamma there exists an online-learning infinite-memory strategy that satisfies the parity objective surely and which achieves an epsilon-optimal mean payoff with probability at least 1 - gamma. We extend the above results to MDPs consisting of more than one end component in a natural way. Finally, we show that the aforementioned guarantees are tight, i.e. there are MDPs for which stronger combinations of the guarantees cannot be ensured

Dagstuhl Research Online Publication Server

Model-free reinforcement learning for stochastic parity games

Author: Hahn EM
Perez M
Schewe S
Somenzi F
Trivedi A
Wojtczak D
Publication venue
Publication date: 01/01/2020
Field of study

This paper investigates the use of model-free reinforcement learning to compute the optimal value in two-player stochastic games with parity objectives. In this setting, two decision makers, player Min and player Max, compete on a finite game arena - a stochastic game graph with unknown but fixed probability distributions - to minimize and maximize, respectively, the probability of satisfying a parity objective. We give a reduction from stochastic parity games to a family of stochastic reachability games with a parameter ε, such that the value of a stochastic parity game equals the limit of the values of the corresponding simple stochastic games as the parameter ε tends to 0. Since this reduction does not require the knowledge of the probabilistic transition structure of the underlying game arena, model-free reinforcement learning algorithms, such as minimax Q-learning, can be used to approximate the value and mutual best-response strategies for both players in the underlying stochastic parity game. We also present a streamlined reduction from 112-player parity games to reachability games that avoids recourse to nondeterminism. Finally, we report on the experimental evaluations of both reductions

University of Liverpool Repository

Dagstuhl Research Online Publication Server

University of Twente Research Information

Model-Free Reinforcement Learning for Lexicographic Omega-Regular Objectives

Author: Hahn Ernst-Moritz
Perez Mateo
Schewe Sven
Somenzi Fabio
Trivedi Ashutosh
Wojtczak Dominik
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/11/2021
Field of study

University of Liverpool Repository

University of Twente Research Information

Variations on the Stochastic Shortest Path Problem

Author: A. Ehrenfeucht
B.V. Cherkassky
C. Courcoubetis
D.P. Bertsekas
E. Filiot
K. Chatterjee
L. Alfaro de
M. Sakaguchi
M.L. Puterman
V. Forejt
Y. Ohtsubo
Publication venue
Publication date: 04/11/2014
Field of study

In this invited contribution, we revisit the stochastic shortest path problem, and show how recent results allow one to improve over the classical solutions: we present algorithms to synthesize strategies with multiple guarantees on the distribution of the length of paths reaching a given target, rather than simply minimizing its expected value. The concepts and algorithms that we propose here are applications of more general results that have been obtained recently for Markov decision processes and that are described in a series of recent papers.Comment: Invited paper for VMCAI 201

arXiv.org e-Print Archive

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

DI-fusion

Hal-Diderot

HAL-Rennes 1

Computer Aided Verification

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/08/2022
Field of study

This open access two-volume set LNCS 13371 and 13372 constitutes the refereed proceedings of the 34rd International Conference on Computer Aided Verification, CAV 2022, which was held in Haifa, Israel, in August 2022. The 40 full papers presented together with 9 tool papers and 2 case studies were carefully reviewed and selected from 209 submissions. The papers were organized in the following topical sections: Part I: Invited papers; formal methods for probabilistic programs; formal methods for neural networks; software Verification and model checking; hyperproperties and security; formal methods for hardware, cyber-physical, and hybrid systems. Part II: Probabilistic techniques; automata and logic; deductive verification and decision procedures; machine learning; synthesis and concurrency. This is an open access book

Directory of Open Access Books (DOAB)

Formal methods with a touch of magic

Author: Alamdari Par Alizadeh
Avni Guy
Henzinger Thomas A
Lukina Anna
Publication venue: TU Wien Academic Press
Publication date: 01/01/2020
Field of study

Machine learning and formal methods have complimentary benefits and drawbacks. In this work, we address the controller-design problem with a combination of techniques from both fields. The use of black-box neural networks in deep reinforcement learning (deep RL) poses a challenge for such a combination. Instead of reasoning formally about the output of deep RL, which we call the wizard, we extract from it a decision-tree based model, which we refer to as the magic book. Using the extracted model as an intermediary, we are able to handle problems that are infeasible for either deep RL or formal methods by themselves. First, we suggest, for the first time, a synthesis procedure that is based on a magic book. We synthesize a stand-alone correct-by-design controller that enjoys the favorable performance of RL. Second, we incorporate a magic book in a bounded model checking (BMC) procedure. BMC allows us to find numerous traces of the plant under the control of the wizard, which a user can use to increase the trustworthiness of the wizard and direct further training

arXiv.org e-Print Archive

IST Austria: PubRep (Institute of Science and Technology)

reposiTUm (TUW Vienna)

Safe Learning for Near Optimal Scheduling

Author: Busatto-Gaston Damien
Chakraborty Debraj
Guha Shibashis
Pérez Guillermo A.
Raskin Jean-François
Publication venue
Publication date: 01/01/2021
Field of study

In this paper, we investigate the combination of synthesis, model-based learning, and online sampling techniques to obtain safe and near-optimal schedulers for a preemptible task scheduling problem. Our algorithms can handle Markov decision processes (MDPs) that have 1020 states and beyond which cannot be handled with state-of-the art probabilistic model-checkers. We provide probably approximately correct (PAC) guarantees for learning the model. Additionally, we extend Monte-Carlo tree search with advice, computed using safety games or obtained using the earliest-deadline-first scheduler, to safely explore the learned model online. Finally, we implemented and compared our algorithms empirically against shielded deep Q-learning on large task systems

arXiv.org e-Print Archive

Institutional Repository Universiteit Antwerpen