Search CORE

11 research outputs found

Trading Performance for Stability in Markov Decision Processes

Author: Brázdil Tomáš
Chatterjee Krishnendu
Forejt Vojtěch
Kučera Antonín
Publication venue
Publication date: 01/01/2013
Field of study

We study the complexity of central controller synthesis problems for finite-state Markov decision processes, where the objective is to optimize both the expected mean-payoff performance of the system and its stability. We argue that the basic theoretical notion of expressing the stability in terms of the variance of the mean-payoff (called global variance in our paper) is not always sufficient, since it ignores possible instabilities on respective runs. For this reason we propose alernative definitions of stability, which we call local and hybrid variance, and which express how rewards on each run deviate from the run's own mean-payoff and from the expected mean-payoff, respectively. We show that a strategy ensuring both the expected mean-payoff and the variance below given bounds requires randomization and memory, under all the above semantics of variance. We then look at the problem of determining whether there is a such a strategy. For the global variance, we show that the problem is in PSPACE, and that the answer can be approximated in pseudo-polynomial time. For the hybrid variance, the analogous decision problem is in NP, and a polynomial-time approximating algorithm also exists. For local variance, we show that the decision problem is in NP. Since the overall performance can be traded for stability (and vice versa), we also present algorithms for approximating the associated Pareto curve in all the three cases. Finally, we study a special case of the decision problems, where we require a given expected mean-payoff together with zero variance. Here we show that the problems can be all solved in polynomial time.Comment: Extended version of a paper presented at LICS 201

arXiv.org e-Print Archive

Crossref

Oxford University Research Archive

IST Austria: PubRep (Institute of Science and Technology)

Value Iteration for Long-run Average Reward in Markov Decision Processes

Author: A Komuravelli
A McIver
AF Veinott
AK McIver
C Baier
C Courcoubetis
J Filar
K Chatterjee
K Chatterjee
K Chatterjee
K Chatterjee
M Duflot
M Kwiatkowska
M Kwiatkowska
M Kwiatkowska
ML Puterman
O Michael
RA Howard
S Giro
S Haddad
T Brázdil
T Brázdil
T Brázdil
Publication venue
Publication date: 13/07/2017
Field of study

Markov decision processes (MDPs) are standard models for probabilistic systems with non-deterministic behaviours. Long-run average rewards provide a mathematically elegant formalism for expressing long term performance. Value iteration (VI) is one of the simplest and most efficient algorithmic approaches to MDPs with other properties, such as reachability objectives. Unfortunately, a naive extension of VI does not work for MDPs with long-run average rewards, as there is no known stopping criterion. In this work our contributions are threefold. (1) We refute a conjecture related to stopping criteria for MDPs with long-run average rewards. (2) We present two practical algorithms for MDPs with long-run average rewards based on VI. First, we show that a combination of applying VI locally for each maximal end-component (MEC) and VI for reachability objectives can provide approximation guarantees. Second, extending the above approach with a simulation-guided on-demand variant of VI, we present an anytime algorithm that is able to deal with very large models. (3) Finally, we present experimental results showing that our methods significantly outperform the standard approaches on several benchmarks

arXiv.org e-Print Archive

Crossref

Lancaster E-Prints

Non-Zero Sum Games for Reactive Synthesis

Author: A Brandenburger
A Ehrenfeucht
B Aminof
C Baier
C Wu
D Berwanger
D Fisman
E Filiot
EM Clarke
J Filar
J Nash
J-P Queille
JA Filar
JY Halpern
K Chatterjee
K Chatterjee
K Chatterjee
K Chatterjee
K Chatterjee
K Chatterjee
L Brim
L Khachiyan
M Faella
M Puterman
M Randour
M Randour
M Ummels
O Kupferman
TA Henzinger
U Zwick
W Damm
Publication venue
Publication date: 17/12/2015
Field of study

In this invited contribution, we summarize new solution concepts useful for the synthesis of reactive systems that we have introduced in several recent publications. These solution concepts are developed in the context of non-zero sum games played on graphs. They are part of the contributions obtained in the inVEST project funded by the European Research Council.Comment: LATA'16 invited pape

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

Institutional Repository Universiteit Antwerpen

Conditional Value-at-Risk for Reachability and Mean Payoff in Markov Decision Processes

Author: Ashok Pranav
Chatterjee Krishnendu
Forejt Vojtech
Publication venue
Publication date: 08/05/2018
Field of study

We present the conditional value-at-risk (CVaR) in the context of Markov chains and Markov decision processes with reachability and mean-payoff objectives. CVaR quantifies risk by means of the expectation of the worst p-quantile. As such it can be used to design risk-averse systems. We consider not only CVaR constraints, but also introduce their conjunction with expectation constraints and quantile constraints (value-at-risk, VaR). We derive lower and upper bounds on the computational complexity of the respective decision problems and characterize the structure of the strategies in terms of memory and randomization

arXiv.org e-Print Archive

Crossref

Lancaster E-Prints

Unifying Two Views on Multiple Mean-Payoff Objectives in Markov Decision Processes

Author: Chatterjee Krishnendu
Křetínská Zuzana
Křetínský Jan
Publication venue
Publication date: 01/01/2017
Field of study

We consider Markov decision processes (MDPs) with multiple limit-average (or mean-payoff) objectives. There exist two different views: (i) the expectation semantics, where the goal is to optimize the expected mean-payoff objective, and (ii) the satisfaction semantics, where the goal is to maximize the probability of runs such that the mean-payoff value stays above a given vector. We consider optimization with respect to both objectives at once, thus unifying the existing semantics. Precisely, the goal is to optimize the expectation while ensuring the satisfaction constraint. Our problem captures the notion of optimization with respect to strategies that are risk-averse (i.e., ensure certain probabilistic guarantee). Our main results are as follows: First, we present algorithms for the decision problems which are always polynomial in the size of the MDP. We also show that an approximation of the Pareto-curve can be computed in time polynomial in the size of the MDP, and the approximation factor, but exponential in the number of dimensions. Second, we present a complete characterization of the strategy complexity (in terms of memory bounds and randomization) required to solve our problem.Comment: Extended journal version of the LICS'15 pape

arXiv.org e-Print Archive

Episciences.org

IST PubRep

IST Austria: PubRep (Institute of Science and Technology)

Approximating values of generalized-reachability stochastic games

Author: Baier Christel
Basset Nicolas
Brenguier Romain
Brázdil Tomás
Chatterjee Krishnendu
Chatterjee Krishnendu
Chen Taolue
Chen Taolue
Condon Anne
Forejt Vojtech
Kwiatkowska Marta
Kwiatkowska Marta Z.
Randour Mickael
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study

Simple stochastic games are turn-based 2½-player games with a reachability objective. The basic question asks whether one player can ensure reaching a given target with at least a given probability. A natural extension is games with a conjunction of such conditions as objective. Despite a plethora of recent results on the analysis of systems with multiple objectives, the decidability of this basic problem remains open. In this paper, we present an algorithm approximating the Pareto frontier of the achievable values to a given precision. Moreover, it is an anytime algorithm, meaning it can be stopped at any time returning the current approximation and its error bound

arXiv.org e-Print Archive

Crossref

IST Austria: PubRep (Institute of Science and Technology)

Publikationsserver der RWTH Aachen University

Dark Control: The Default Mode Network as a Reinforcement Learning Agent

Author: Blumenfeld H.
Flechsig P.
Goodfellow I.
Hastie T.
James W.
Mesulam M.‐M.
Mohamed S.
Sallans B.
Silver D.
Song Z.
Stuss D.
Sutton R. S.
Whiten A.
Yakovlev P.
Publication venue: 'Wiley'
Publication date: 01/08/2020
Field of study

International audienceThe default mode network (DMN) is believed to subserve the baseline mental activity in humans. Its higher energy consumption compared to other brain networks and its intimate coupling with conscious awareness are both pointing to an unknown overarching function. Many research streams speak in favor of an evolutionarily adaptive role in envisioning experience to anticipate the future. In the present work, we propose a process model that tries to explain how the DMN may implement continuous evaluation and prediction of the environment to guide behavior. The main purpose of DMN activity, we argue, may be described by Markov Decision Processes that optimize action policies via value estimates based through vicarious trial and error. Our formal perspective on DMN function naturally accommodates as special cases previous interpretations based on (1) predictive coding, (2) semantic associations, and (3) a sentinel role. Moreover, this process model for the neural optimization of complex behavior in the DMN offers parsimonious explanations for recent experimental findings in animals and humans

Crossref

INRIA a CCSD electronic archive server

HAL-CEA

HAL-Pasteur

HAL-Rennes 1

31th International Symposium on Theoretical Aspects of Computer Science: STACS '14, March 5th to March 8th, 2014, Lyon, France

Author: STACS <31 2014, Lyon>
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik
Publication date: 01/03/2014
Field of study

Digitale Bibliothek Thüringen