Search CORE

179 research outputs found

Value Iteration for Long-run Average Reward in Markov Decision Processes

Author: A Komuravelli
A McIver
AF Veinott
AK McIver
C Baier
C Courcoubetis
J Filar
K Chatterjee
K Chatterjee
K Chatterjee
K Chatterjee
M Duflot
M Kwiatkowska
M Kwiatkowska
M Kwiatkowska
ML Puterman
O Michael
RA Howard
S Giro
S Haddad
T Brázdil
T Brázdil
T Brázdil
Publication venue
Publication date: 31/08/2017
Field of study

Markov decision processes (MDPs) are standard models for probabilistic systems with non-deterministic behaviours. Long-run average rewards provide a mathematically elegant formalism for expressing long term performance. Value iteration (VI) is one of the simplest and most efficient algorithmic approaches to MDPs with other properties, such as reachability objectives. Unfortunately, a naive extension of VI does not work for MDPs with long-run average rewards, as there is no known stopping criterion. In this work our contributions are threefold. (1) We refute a conjecture related to stopping criteria for MDPs with long-run average rewards. (2) We present two practical algorithms for MDPs with long-run average rewards based on VI. First, we show that a combination of applying VI locally for each maximal end-component (MEC) and VI for reachability objectives can provide approximation guarantees. Second, extending the above approach with a simulation-guided on-demand variant of VI, we present an anytime algorithm that is able to deal with very large models. (3) Finally, we present experimental results showing that our methods significantly outperform the standard approaches on several benchmarks

arXiv.org e-Print Archive

Crossref

Qualitative Analysis of Concurrent Mean-payoff Games

Author: Chatterjee Krishnendu
Ibsen-Jensen Rasmus
Publication venue
Publication date: 18/09/2014
Field of study

We consider concurrent games played by two-players on a finite-state graph, where in every round the players simultaneously choose a move, and the current state along with the joint moves determine the successor state. We study a fundamental objective, namely, mean-payoff objective, where a reward is associated to each transition, and the goal of player 1 is to maximize the long-run average of the rewards, and the objective of player 2 is strictly the opposite. The path constraint for player 1 could be qualitative, i.e., the mean-payoff is the maximal reward, or arbitrarily close to it; or quantitative, i.e., a given threshold between the minimal and maximal reward. We consider the computation of the almost-sure (resp. positive) winning sets, where player 1 can ensure that the path constraint is satisfied with probability 1 (resp. positive probability). Our main results for qualitative path constraints are as follows: (1) we establish qualitative determinacy results that show that for every state either player 1 has a strategy to ensure almost-sure (resp. positive) winning against all player-2 strategies, or player 2 has a spoiling strategy to falsify almost-sure (resp. positive) winning against all player-1 strategies; (2) we present optimal strategy complexity results that precisely characterize the classes of strategies required for almost-sure and positive winning for both players; and (3) we present quadratic time algorithms to compute the almost-sure and the positive winning sets, matching the best known bound of algorithms for much simpler problems (such as reachability objectives). For quantitative constraints we show that a polynomial time solution for the almost-sure or the positive winning set would imply a solution to a long-standing open problem (the value problem for turn-based deterministic mean-payoff games) that is not known to be solvable in polynomial time

arXiv.org e-Print Archive

CiteSeerX

University of Liverpool Repository

IST PubRep

IST Austria: PubRep (Institute of Science and Technology)

Minimizing Expected Cost Under Hard Boolean Constraints, with Applications to Quantitative Synthesis

Author: Almagor Shaull
Kupferman Orna
Velner Yaron
Publication venue
Publication date: 01/01/2016
Field of study

In Boolean synthesis, we are given an LTL specification, and the goal is to construct a transducer that realizes it against an adversarial environment. Often, a specification contains both Boolean requirements that should be satisfied against an adversarial environment, and multi-valued components that refer to the quality of the satisfaction and whose expected cost we would like to minimize with respect to a probabilistic environment. In this work we study, for the first time, mean-payoff games in which the system aims at minimizing the expected cost against a probabilistic environment, while surely satisfying an

\omega

-regular condition against an adversarial environment. We consider the case the

\omega

-regular condition is given as a parity objective or by an LTL formula. We show that in general, optimal strategies need not exist, and moreover, the limit value cannot be approximated by finite-memory strategies. We thus focus on computing the limit-value, and give tight complexity bounds for synthesizing

\epsilon

-optimal strategies for both finite-memory and infinite-memory strategies. We show that our game naturally arises in various contexts of synthesis with Boolean and multi-valued objectives. Beyond direct applications, in synthesis with costs and rewards to certain behaviors, it allows us to compute the minimal sensing cost of

\omega

-regular specifications -- a measure of quality in which we look for a transducer that minimizes the expected number of signals that are read from the input

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Oxford University Research Archive

Approximating the Termination Value of One-Counter MDPs and Stochastic Games

Author: G.R. Grimmett
J. Lambert
K. Etessami
K. Etessami
L.B. White
M.L. Puterman
T. Brázdil
Publication venue
Publication date: 01/01/2011
Field of study

One-counter MDPs (OC-MDPs) and one-counter simple stochastic games (OC-SSGs) are 1-player, and 2-player turn-based zero-sum, stochastic games played on the transition graph of classic one-counter automata (equivalently, pushdown automata with a 1-letter stack alphabet). A key objective for the analysis and verification of these games is the termination objective, where the players aim to maximize (minimize, respectively) the probability of hitting counter value 0, starting at a given control state and given counter value. Recently, we studied qualitative decision problems ("is the optimal termination value = 1?") for OC-MDPs (and OC-SSGs) and showed them to be decidable in P-time (in NP and coNP, respectively). However, quantitative decision and approximation problems ("is the optimal termination value ? p", or "approximate the termination value within epsilon") are far more challenging. This is so in part because optimal strategies may not exist, and because even when they do exist they can have a highly non-trivial structure. It thus remained open even whether any of these quantitative termination problems are computable. In this paper we show that all quantitative approximation problems for the termination value for OC-MDPs and OC-SSGs are computable. Specifically, given a OC-SSG, and given epsilon > 0, we can compute a value v that approximates the value of the OC-SSG termination game within additive error epsilon, and furthermore we can compute epsilon-optimal strategies for both players in the game. A key ingredient in our proofs is a subtle martingale, derived from solving certain LPs that we can associate with a maximizing OC-MDP. An application of Azuma's inequality on these martingales yields a computable bound for the "wealth" at which a "rich person's strategy" becomes epsilon-optimal for OC-MDPs.Comment: 35 pages, 1 figure, full version of a paper presented at ICALP 2011, invited for submission to Information and Computatio

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

New Deterministic Algorithms for Solving Parity Games

Author: C Stirling
D Berwanger
D Berwanger
H Björklund
J Fearnley
J Fearnley
J Gajarský
J Obdržálek
J Obdržálek
J Vöge
M Jurdziński
M Jurdziński
M Jurdziński
R McNaughton
S Schewe
W Zielonka
Publication venue
Publication date: 10/12/2015
Field of study

We study parity games in which one of the two players controls only a small number

k

of nodes and the other player controls the

n-k

other nodes of the game. Our main result is a fixed-parameter algorithm that solves bipartite parity games in time

k^{O(\sqrt{k})}\cdot O(n^3)

, and general parity games in time

(p+k)^{O(\sqrt{k})} \cdot O(pnm)

, where

p

is the number of distinct priorities and

m

is the number of edges. For all games with

k = o(n)

this improves the previously fastest algorithm by Jurdzi{\'n}ski, Paterson, and Zwick (SICOMP 2008). We also obtain novel kernelization results and an improved deterministic algorithm for graphs with small average degree

arXiv.org e-Print Archive

Maastricht University Research Portal

Crossref

On (Subgame Perfect) Secure Equilibrium in Quantitative Reachability Games

Author: Erich Grädel
Hugo Gimbert
Julie De Pril
Thomas Brihaye
Véronique Bruyère
Publication venue: 'Logical Methods in Computer Science e.V.'
Publication date: 26/02/2013
Field of study

We study turn-based quantitative multiplayer non zero-sum games played on finite graphs with reachability objectives. In such games, each player aims at reaching his own goal set of states as soon as possible. A previous work on this model showed that Nash equilibria (resp. secure equilibria) are guaranteed to exist in the multiplayer (resp. two-player) case. The existence of secure equilibria in the multiplayer case remained and is still an open problem. In this paper, we focus our study on the concept of subgame perfect equilibrium, a refinement of Nash equilibrium well-suited in the framework of games played on graphs. We also introduce the new concept of subgame perfect secure equilibrium. We prove the existence of subgame perfect equilibria (resp. subgame perfect secure equilibria) in multiplayer (resp. two-player) quantitative reachability games. Moreover, we provide an algorithm deciding the existence of secure equilibria in the multiplayer case.Comment: 32 pages. Full version of the FoSSaCS 2012 proceedings pape

arXiv.org e-Print Archive

Crossref

Episciences.org

Coordinated Defense Allocation in Reach-Avoid Scenarios with Efficient Online Optimization

Author: Chen Hua
Liu Junwei
Lu Haibo
Ouyang Zikai
Yang Jiahui
Zhang Wei
Publication venue
Publication date: 02/06/2023
Field of study

In this paper, we present a dual-layer online optimization strategy for defender robots operating in multiplayer reach-avoid games within general convex environments. Our goal is to intercept as many attacker robots as possible without prior knowledge of their strategies. To balance optimality and efficiency, our approach alternates between coordinating defender coalitions against individual attackers and allocating coalitions to attackers based on predicted single-attack coordination outcomes. We develop an online convex programming technique for single-attack defense coordination, which not only allows adaptability to joint states but also identifies the maximal region of initial joint states that guarantees successful attack interception. Our defense allocation algorithm utilizes a hierarchical iterative method to approximate integer linear programs with a monotonicity constraint, reducing computational burden while ensuring enhanced defense performance over time. Extensive simulations conducted in 2D and 3D environments validate the efficacy of our approach in comparison to state-of-the-art approaches, and show its applicability in wheeled mobile robots and quadcopters

arXiv.org e-Print Archive