Search CORE

6 research outputs found

Life Is Random, Time Is Not: Markov Decision Processes with Window Objectives

Author: Brihaye Thomas
Delgrange Florent
Oualhadj Youssouf
Randour Mickael
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 30th International Conference on Concurrency Theory (CONCUR 2019)
Publication date: 01/01/2019
Field of study

The window mechanism was introduced by Chatterjee et al. [Krishnendu Chatterjee et al., 2015] to strengthen classical game objectives with time bounds. It permits to synthesize system controllers that exhibit acceptable behaviors within a configurable time frame, all along their infinite execution, in contrast to the traditional objectives that only require correctness of behaviors in the limit. The window concept has proved its interest in a variety of two-player zero-sum games, thanks to the ability to reason about such time bounds in system specifications, but also the increased tractability that it usually yields. In this work, we extend the window framework to stochastic environments by considering the fundamental threshold probability problem in Markov decision processes for window objectives. That is, given such an objective, we want to synthesize strategies that guarantee satisfying runs with a given probability. We solve this problem for the usual variants of window objectives, where either the time frame is set as a parameter, or we ask if such a time frame exists. We develop a generic approach for window-based objectives and instantiate it for the classical mean-payoff and parity objectives, already considered in games. Our work paves the way to a wide use of the window mechanism in stochastic models

arXiv.org e-Print Archive

Episciences.org

Dagstuhl Research Online Publication Server

Publikationsserver der RWTH Aachen University

The Wasserstein Believer: Learning Belief Updates for Partially Observable Environments through Reliable Latent Space Models

Author: Avalos Raphael
Delgrange Florent
Nowé Ann
Pérez Guillermo A.
Roijers Diederik M.
Publication venue
Publication date: 26/10/2023
Field of study

Partially Observable Markov Decision Processes (POMDPs) are used to model environments where the full state cannot be perceived by an agent. As such the agent needs to reason taking into account the past observations and actions. However, simply remembering the full history is generally intractable due to the exponential growth in the history space. Maintaining a probability distribution that models the belief over what the true state is can be used as a sufficient statistic of the history, but its computation requires access to the model of the environment and is often intractable. While SOTA algorithms use Recurrent Neural Networks to compress the observation-action history aiming to learn a sufficient statistic, they lack guarantees of success and can lead to sub-optimal policies. To overcome this, we propose the Wasserstein Belief Updater, an RL algorithm that learns a latent model of the POMDP and an approximation of the belief update. Our approach comes with theoretical guarantees on the quality of our approximation ensuring that our outputted beliefs allow for learning the optimal value function

arXiv.org e-Print Archive

Life is Random, Time is Not: Markov Decision Processes with Window Objectives

Author: Florent Delgrange
Mickael Randour
Thomas Brihaye
Youssouf Oualhadj
Publication venue: Logical Methods in Computer Science e.V.
Publication date: 01/12/2020
Field of study

The window mechanism was introduced by Chatterjee et al. to strengthen classical game objectives with time bounds. It permits to synthesize system controllers that exhibit acceptable behaviors within a configurable time frame, all along their infinite execution, in contrast to the traditional objectives that only require correctness of behaviors in the limit. The window concept has proved its interest in a variety of two-player zero-sum games because it enables reasoning about such time bounds in system specifications, but also thanks to the increased tractability that it usually yields. In this work, we extend the window framework to stochastic environments by considering Markov decision processes. A fundamental problem in this context is the threshold probability problem: given an objective it aims to synthesize strategies that guarantee satisfying runs with a given probability. We solve it for the usual variants of window objectives, where either the time frame is set as a parameter, or we ask if such a time frame exists. We develop a generic approach for window-based objectives and instantiate it for the classical mean-payoff and parity objectives, already considered in games. Our work paves the way to a wide use of the window mechanism in stochastic models

Directory of Open Access Journals

Wasserstein auto-encoded MDPs : formal verification of efficiently distilled RL policies with many-sided guarantees

Author: Delgrange Florent
Now\ue9 Ann
P\ue9rez Guillermo Alberto
Publication venue
Publication date: 01/01/2023
Field of study

Institutional Repository Universiteit Antwerpen

Distillation of RL policies with formal guarantees via variational abstraction of Markov decision processes

Author: Delgrange Florent
Now\ue9 Ann
P\ue9rez Guillermo Alberto
Publication venue
Publication date: 01/01/2022
Field of study

We consider the challenge of policy simplification and verification in the context of policies learned through reinforcement learning (RL) in continuous environments. In well-behaved settings, RL algorithms have convergence guarantees in the limit. While these guarantees are valuable, they are insufficient for safety-critical applications. Furthermore, they are lost when applying advanced techniques such as deep-RL. To recover guarantees when applying advanced RL algorithms to more complex environments with (i) reachability, (ii) safety-constrained reachability, or (iii) discounted-reward objectives, we build upon the DeepMDP framework introduced by Gelada et al. to derive new bisimulation bounds between the unknown environment and a learned discrete latent model of it. Our bisimulation bounds enable the application of formal methods for Markov decision processes. Finally, we show how one can use a policy obtained via state-of-the-art RL to efficiently train a variational autoencoder that yields a discrete latent model with provably approximately correct bisimulation guarantees. Additionally, we obtain a distilled version of the policy for the latent model.Comment: AAAI 2022, technical report including supplementary material (10 pages main text, 14 pages appendix

arXiv.org e-Print Archive

Institutional Repository Universiteit Antwerpen