Search CORE

31 research outputs found

Omega-Regular Objectives in Model-Free Reinforcement Learning

Author: A Hordijk
A Pnueli
C Baier
C Courcoubetis
D Perrin
D Silver
DP Bertsekas
K Chatterjee
M Kwiatkowska
M Lahijanian
M Riedmiller
ML Puterman
O Carton
RS Sutton
S Sickert
S Sickert
SC Krishnan
T Babiak
T Brázdil
TS Eliot
V Mnih
Wolfgang THOMAS
Z Manna
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/09/2018
Field of study

We provide the first solution for model-free reinforcement learning of ω-regular objectives for Markov decision processes (MDPs). We present a constructive reduction from the almost-sure satisfaction of ω-regular bjectives to an almost-sure reachability problem, and extend this technique to learning how to control an unknown model so that the chance of satisfying the objective is maximized. We compile ω-regular properties into limit-deterministic B¨uchi automata instead of the traditional Rabin automata; this choice sidesteps difficulties that have marred previous proposals. Our approach allows us to apply model-free, off-the-shelf reinforcement learning algorithms to compute optimal strategies from the observations of the MDP. We present an experimental evaluation of our technique on benchmark learning problems

arXiv.org e-Print Archive

University of Liverpool Repository

Queen's University Belfast Research Portal

Crossref

University of Twente Research Information

Mungojerrie:Linear-Time Objectives in Model-Free Reinforcement Learning

Author: Hahn Ernst Moritz
Perez Mateo
Schewe Sven
Somenzi Fabio
Trivedi Ashutosh
Wojtczak Dominik
Publication venue: Springer
Publication date: 22/04/2023
Field of study

Mungojerrie is an extensible tool that provides a framework to translate linear-time objectives into reward for reinforcement learning (RL). The tool provides convergent RL algorithms for stochastic games, reference implementations of existing reward translations for ω -regular objectives, and an internal probabilistic model checker for ω -regular objectives. This functionality is modular and operates on shared data structures, which enables fast development of new translation techniques. Mungojerrie supports finite models specified in PRISM and ω -automata specified in the HOA format, with an integrated command line interface to external linear temporal logic translators. Mungojerrie is distributed with a set of benchmarks for ω -regular objectives in RL.</p

University of Twente Research Information