179 research outputs found
Value Iteration for Long-run Average Reward in Markov Decision Processes
Markov decision processes (MDPs) are standard models for probabilistic
systems with non-deterministic behaviours. Long-run average rewards provide a
mathematically elegant formalism for expressing long term performance. Value
iteration (VI) is one of the simplest and most efficient algorithmic approaches
to MDPs with other properties, such as reachability objectives. Unfortunately,
a naive extension of VI does not work for MDPs with long-run average rewards,
as there is no known stopping criterion. In this work our contributions are
threefold. (1) We refute a conjecture related to stopping criteria for MDPs
with long-run average rewards. (2) We present two practical algorithms for MDPs
with long-run average rewards based on VI. First, we show that a combination of
applying VI locally for each maximal end-component (MEC) and VI for
reachability objectives can provide approximation guarantees. Second, extending
the above approach with a simulation-guided on-demand variant of VI, we present
an anytime algorithm that is able to deal with very large models. (3) Finally,
we present experimental results showing that our methods significantly
outperform the standard approaches on several benchmarks
Qualitative Analysis of Concurrent Mean-payoff Games
We consider concurrent games played by two-players on a finite-state graph,
where in every round the players simultaneously choose a move, and the current
state along with the joint moves determine the successor state. We study a
fundamental objective, namely, mean-payoff objective, where a reward is
associated to each transition, and the goal of player 1 is to maximize the
long-run average of the rewards, and the objective of player 2 is strictly the
opposite. The path constraint for player 1 could be qualitative, i.e., the
mean-payoff is the maximal reward, or arbitrarily close to it; or quantitative,
i.e., a given threshold between the minimal and maximal reward. We consider the
computation of the almost-sure (resp. positive) winning sets, where player 1
can ensure that the path constraint is satisfied with probability 1 (resp.
positive probability). Our main results for qualitative path constraints are as
follows: (1) we establish qualitative determinacy results that show that for
every state either player 1 has a strategy to ensure almost-sure (resp.
positive) winning against all player-2 strategies, or player 2 has a spoiling
strategy to falsify almost-sure (resp. positive) winning against all player-1
strategies; (2) we present optimal strategy complexity results that precisely
characterize the classes of strategies required for almost-sure and positive
winning for both players; and (3) we present quadratic time algorithms to
compute the almost-sure and the positive winning sets, matching the best known
bound of algorithms for much simpler problems (such as reachability
objectives). For quantitative constraints we show that a polynomial time
solution for the almost-sure or the positive winning set would imply a solution
to a long-standing open problem (the value problem for turn-based deterministic
mean-payoff games) that is not known to be solvable in polynomial time
Minimizing Expected Cost Under Hard Boolean Constraints, with Applications to Quantitative Synthesis
In Boolean synthesis, we are given an LTL specification, and the goal is to
construct a transducer that realizes it against an adversarial environment.
Often, a specification contains both Boolean requirements that should be
satisfied against an adversarial environment, and multi-valued components that
refer to the quality of the satisfaction and whose expected cost we would like
to minimize with respect to a probabilistic environment.
In this work we study, for the first time, mean-payoff games in which the
system aims at minimizing the expected cost against a probabilistic
environment, while surely satisfying an -regular condition against an
adversarial environment. We consider the case the -regular condition is
given as a parity objective or by an LTL formula. We show that in general,
optimal strategies need not exist, and moreover, the limit value cannot be
approximated by finite-memory strategies. We thus focus on computing the
limit-value, and give tight complexity bounds for synthesizing
-optimal strategies for both finite-memory and infinite-memory
strategies.
We show that our game naturally arises in various contexts of synthesis with
Boolean and multi-valued objectives. Beyond direct applications, in synthesis
with costs and rewards to certain behaviors, it allows us to compute the
minimal sensing cost of -regular specifications -- a measure of quality
in which we look for a transducer that minimizes the expected number of signals
that are read from the input
Approximating the Termination Value of One-Counter MDPs and Stochastic Games
One-counter MDPs (OC-MDPs) and one-counter simple stochastic games (OC-SSGs)
are 1-player, and 2-player turn-based zero-sum, stochastic games played on the
transition graph of classic one-counter automata (equivalently, pushdown
automata with a 1-letter stack alphabet). A key objective for the analysis and
verification of these games is the termination objective, where the players aim
to maximize (minimize, respectively) the probability of hitting counter value
0, starting at a given control state and given counter value. Recently, we
studied qualitative decision problems ("is the optimal termination value = 1?")
for OC-MDPs (and OC-SSGs) and showed them to be decidable in P-time (in NP and
coNP, respectively). However, quantitative decision and approximation problems
("is the optimal termination value ? p", or "approximate the termination value
within epsilon") are far more challenging. This is so in part because optimal
strategies may not exist, and because even when they do exist they can have a
highly non-trivial structure. It thus remained open even whether any of these
quantitative termination problems are computable. In this paper we show that
all quantitative approximation problems for the termination value for OC-MDPs
and OC-SSGs are computable. Specifically, given a OC-SSG, and given epsilon >
0, we can compute a value v that approximates the value of the OC-SSG
termination game within additive error epsilon, and furthermore we can compute
epsilon-optimal strategies for both players in the game. A key ingredient in
our proofs is a subtle martingale, derived from solving certain LPs that we can
associate with a maximizing OC-MDP. An application of Azuma's inequality on
these martingales yields a computable bound for the "wealth" at which a "rich
person's strategy" becomes epsilon-optimal for OC-MDPs.Comment: 35 pages, 1 figure, full version of a paper presented at ICALP 2011,
invited for submission to Information and Computatio
New Deterministic Algorithms for Solving Parity Games
We study parity games in which one of the two players controls only a small
number of nodes and the other player controls the other nodes of the
game. Our main result is a fixed-parameter algorithm that solves bipartite
parity games in time , and general parity games in
time , where is the number of distinct
priorities and is the number of edges. For all games with this
improves the previously fastest algorithm by Jurdzi{\'n}ski, Paterson, and
Zwick (SICOMP 2008). We also obtain novel kernelization results and an improved
deterministic algorithm for graphs with small average degree
On (Subgame Perfect) Secure Equilibrium in Quantitative Reachability Games
We study turn-based quantitative multiplayer non zero-sum games played on
finite graphs with reachability objectives. In such games, each player aims at
reaching his own goal set of states as soon as possible. A previous work on
this model showed that Nash equilibria (resp. secure equilibria) are guaranteed
to exist in the multiplayer (resp. two-player) case. The existence of secure
equilibria in the multiplayer case remained and is still an open problem. In
this paper, we focus our study on the concept of subgame perfect equilibrium, a
refinement of Nash equilibrium well-suited in the framework of games played on
graphs. We also introduce the new concept of subgame perfect secure
equilibrium. We prove the existence of subgame perfect equilibria (resp.
subgame perfect secure equilibria) in multiplayer (resp. two-player)
quantitative reachability games. Moreover, we provide an algorithm deciding the
existence of secure equilibria in the multiplayer case.Comment: 32 pages. Full version of the FoSSaCS 2012 proceedings pape
Coordinated Defense Allocation in Reach-Avoid Scenarios with Efficient Online Optimization
In this paper, we present a dual-layer online optimization strategy for
defender robots operating in multiplayer reach-avoid games within general
convex environments. Our goal is to intercept as many attacker robots as
possible without prior knowledge of their strategies. To balance optimality and
efficiency, our approach alternates between coordinating defender coalitions
against individual attackers and allocating coalitions to attackers based on
predicted single-attack coordination outcomes. We develop an online convex
programming technique for single-attack defense coordination, which not only
allows adaptability to joint states but also identifies the maximal region of
initial joint states that guarantees successful attack interception. Our
defense allocation algorithm utilizes a hierarchical iterative method to
approximate integer linear programs with a monotonicity constraint, reducing
computational burden while ensuring enhanced defense performance over time.
Extensive simulations conducted in 2D and 3D environments validate the efficacy
of our approach in comparison to state-of-the-art approaches, and show its
applicability in wheeled mobile robots and quadcopters
- …