15 research outputs found
On the Succinctness of Good-for-MDPs Automata
Good-for-MDPs and good-for-games automata are two recent classes of
nondeterministic automata that reside between general nondeterministic and
deterministic automata. Deterministic automata are good-for-games, and
good-for-games automata are good-for-MDPs, but not vice versa. One of the
question this raises is how these classes relate in terms of succinctness.
Good-for-games automata are known to be exponentially more succinct than
deterministic automata, but the gap between good-for-MDPs and good-for-games
automata as well as the gap between ordinary nondeterministic automata and
those that are good-for-MDPs have been open. We establish that these gaps are
exponential, and sharpen this result by showing that the latter gap remains
exponential when restricting the nondeterministic automata to separating safety
or unambiguous reachability automata.Comment: 18 page
Good-for-MDPs Automata for Probabilistic Analysis and Reinforcement Learning
We characterize the class of nondeterministic -automata that can be
used for the analysis of finite Markov decision processes (MDPs). We call these
automata `good-for-MDPs' (GFM). We show that GFM automata are closed under
classic simulation as well as under more powerful simulation relations that
leverage properties of optimal control strategies for MDPs. This closure
enables us to exploit state-space reduction techniques, such as those based on
direct and delayed simulation, that guarantee simulation equivalence. We
demonstrate the promise of GFM automata by defining a new class of automata
with favorable properties - they are B\"uchi automata with low branching degree
obtained through a simple construction - and show that going beyond
limit-deterministic automata may significantly benefit reinforcement learning
Reward Shaping for Reinforcement Learning with Omega-Regular Objectives
Recently, successful approaches have been made to exploit good-for-MDPs automata (B\"uchi automata with a restricted form of nondeterminism) for model free reinforcement learning, a class of automata that subsumes good for games automata and the most widespread class of limit deterministic automata. The foundation of using these B\"uchi automata is that the B\"uchi condition can, for good-for-MDP automata, be translated to reachability. The drawback of this translation is that the rewards are, on average, reaped very late, which requires long episodes during the learning process. We devise a new reward shaping approach that overcomes this issue. We show that the resulting model is equivalent to a discounted payoff objective with a biased discount that simplifies and improves on prior work in this direction
Mungojerrie:Linear-Time Objectives in Model-Free Reinforcement Learning
Mungojerrie is an extensible tool that provides a framework to translate linear-time objectives into reward for reinforcement learning (RL). The tool provides convergent RL algorithms for stochastic games, reference implementations of existing reward translations for ω -regular objectives, and an internal probabilistic model checker for ω -regular objectives. This functionality is modular and operates on shared data structures, which enables fast development of new translation techniques. Mungojerrie supports finite models specified in PRISM and ω -automata specified in the HOA format, with an integrated command line interface to external linear temporal logic translators. Mungojerrie is distributed with a set of benchmarks for ω -regular objectives in RL.</p
A PAC Learning Algorithm for LTL and Omega-regular Objectives in MDPs
Linear temporal logic (LTL) and omega-regular objectives -- a superset of LTL
-- have seen recent use as a way to express non-Markovian objectives in
reinforcement learning. We introduce a model-based probably approximately
correct (PAC) learning algorithm for omega-regular objectives in Markov
decision processes. Unlike prior approaches, our algorithm learns from sampled
trajectories of the system and does not require prior knowledge of the system's
topology