Search CORE

15 research outputs found

Alternating Good-for-MDPs Automata

Author: Hahn Ernst Moritz
Perez Mateo
Schewe Sven
Somenzi Fabio
Trivedi Ashutosh
Wojtczak Dominik
Publication venue: Springer Nature
Publication date: 21/10/2022
Field of study

University of Twente Research Information

On the Succinctness of Good-for-MDPs Automata

Author: Schewe Sven
Tang Qiyi
Publication venue
Publication date: 21/07/2023
Field of study

Good-for-MDPs and good-for-games automata are two recent classes of nondeterministic automata that reside between general nondeterministic and deterministic automata. Deterministic automata are good-for-games, and good-for-games automata are good-for-MDPs, but not vice versa. One of the question this raises is how these classes relate in terms of succinctness. Good-for-games automata are known to be exponentially more succinct than deterministic automata, but the gap between good-for-MDPs and good-for-games automata as well as the gap between ordinary nondeterministic automata and those that are good-for-MDPs have been open. We establish that these gaps are exponential, and sharpen this result by showing that the latter gap remains exponential when restricting the nondeterministic automata to separating safety or unambiguous reachability automata.Comment: 18 page

arXiv.org e-Print Archive

Good-for-MDPs Automata for Probabilistic Analysis and Reinforcement Learning

Author: Hahn Ernst Moritz
Perez Mateo
Schewe Sven
Somenzi Fabio
Trivedi Ashutosh
Wojtczak Dominik
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/10/2019
Field of study

We characterize the class of nondeterministic

{\omega}

-automata that can be used for the analysis of finite Markov decision processes (MDPs). We call these automata `good-for-MDPs' (GFM). We show that GFM automata are closed under classic simulation as well as under more powerful simulation relations that leverage properties of optimal control strategies for MDPs. This closure enables us to exploit state-space reduction techniques, such as those based on direct and delayed simulation, that guarantee simulation equivalence. We demonstrate the promise of GFM automata by defining a new class of automata with favorable properties - they are B\"uchi automata with low branching degree obtained through a simple construction - and show that going beyond limit-deterministic automata may significantly benefit reinforcement learning

arXiv.org e-Print Archive

University of Liverpool Repository

Crossref

Reward Shaping for Reinforcement Learning with Omega-Regular Objectives

Author: Hahn EM
Perez M
Schewe S
Somenzi F
Trivedi A
Wojtczak D
Publication venue
Publication date: 16/01/2020
Field of study

Recently, successful approaches have been made to exploit good-for-MDPs automata (B\"uchi automata with a restricted form of nondeterminism) for model free reinforcement learning, a class of automata that subsumes good for games automata and the most widespread class of limit deterministic automata. The foundation of using these B\"uchi automata is that the B\"uchi condition can, for good-for-MDP automata, be translated to reachability. The drawback of this translation is that the rewards are, on average, reaped very late, which requires long episodes during the learning process. We devise a new reward shaping approach that overcomes this issue. We show that the resulting model is equivalent to a discounted payoff objective with a biased discount that simplifies and improves on prior work in this direction

arXiv.org e-Print Archive

University of Liverpool Repository

Model-Free Reinforcement Learning for Lexicographic Omega-Regular Objectives

Author: Hahn Ernst-Moritz
Perez Mateo
Schewe Sven
Somenzi Fabio
Trivedi Ashutosh
Wojtczak Dominik
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/11/2021
Field of study

University of Liverpool Repository

University of Twente Research Information

Mungojerrie:Linear-Time Objectives in Model-Free Reinforcement Learning

Author: Hahn Ernst Moritz
Perez Mateo
Schewe Sven
Somenzi Fabio
Trivedi Ashutosh
Wojtczak Dominik
Publication venue: Springer
Publication date: 22/04/2023
Field of study

Mungojerrie is an extensible tool that provides a framework to translate linear-time objectives into reward for reinforcement learning (RL). The tool provides convergent RL algorithms for stochastic games, reference implementations of existing reward translations for ω -regular objectives, and an internal probabilistic model checker for ω -regular objectives. This functionality is modular and operates on shared data structures, which enables fast development of new translation techniques. Mungojerrie supports finite models specified in PRISM and ω -automata specified in the HOA format, with an integrated command line interface to external linear temporal logic translators. Mungojerrie is distributed with a set of benchmarks for ω -regular objectives in RL.</p

University of Twente Research Information

A PAC Learning Algorithm for LTL and Omega-regular Objectives in MDPs

Author: Perez Mateo
Somenzi Fabio
Trivedi Ashutosh
Publication venue
Publication date: 18/10/2023
Field of study

Linear temporal logic (LTL) and omega-regular objectives -- a superset of LTL -- have seen recent use as a way to express non-Markovian objectives in reinforcement learning. We introduce a model-based probably approximately correct (PAC) learning algorithm for omega-regular objectives in Markov decision processes. Unlike prior approaches, our algorithm learns from sampled trajectories of the system and does not require prior knowledge of the system's topology

arXiv.org e-Print Archive