16 research outputs found
Optimal Strategies in Infinite-state Stochastic Reachability Games
We consider perfect-information reachability stochastic games for 2 players
on infinite graphs. We identify a subclass of such games, and prove two
interesting properties of it: first, Player Max always has optimal strategies
in games from this subclass, and second, these games are strongly determined.
The subclass is defined by the property that the set of all values can only
have one accumulation point -- 0. Our results nicely mirror recent results for
finitely-branching games, where, on the contrary, Player Min always has optimal
strategies. However, our proof methods are substantially different, because the
roles of the players are not symmetric. We also do not restrict the branching
of the games. Finally, we apply our results in the context of recently studied
One-Counter stochastic games
Recursive Concurrent Stochastic Games
We study Recursive Concurrent Stochastic Games (RCSGs), extending our recent
analysis of recursive simple stochastic games to a concurrent setting where the
two players choose moves simultaneously and independently at each state. For
multi-exit games, our earlier work already showed undecidability for basic
questions like termination, thus we focus on the important case of single-exit
RCSGs (1-RCSGs).
We first characterize the value of a 1-RCSG termination game as the least
fixed point solution of a system of nonlinear minimax functional equations, and
use it to show PSPACE decidability for the quantitative termination problem. We
then give a strategy improvement technique, which we use to show that player 1
(maximizer) has \epsilon-optimal randomized Stackless & Memoryless (r-SM)
strategies for all \epsilon > 0, while player 2 (minimizer) has optimal r-SM
strategies. Thus, such games are r-SM-determined. These results mirror and
generalize in a strong sense the randomized memoryless determinacy results for
finite stochastic games, and extend the classic Hoffman-Karp strategy
improvement approach from the finite to an infinite state setting. The proofs
in our infinite-state setting are very different however, relying on subtle
analytic properties of certain power series that arise from studying 1-RCSGs.
We show that our upper bounds, even for qualitative (probability 1)
termination, can not be improved, even to NP, without a major breakthrough, by
giving two reductions: first a P-time reduction from the long-standing
square-root sum problem to the quantitative termination decision problem for
finite concurrent stochastic games, and then a P-time reduction from the latter
problem to the qualitative termination problem for 1-RCSGs.Comment: 21 pages, 2 figure
Approximating the Termination Value of One-Counter MDPs and Stochastic Games
One-counter MDPs (OC-MDPs) and one-counter simple stochastic games (OC-SSGs)
are 1-player, and 2-player turn-based zero-sum, stochastic games played on the
transition graph of classic one-counter automata (equivalently, pushdown
automata with a 1-letter stack alphabet). A key objective for the analysis and
verification of these games is the termination objective, where the players aim
to maximize (minimize, respectively) the probability of hitting counter value
0, starting at a given control state and given counter value. Recently, we
studied qualitative decision problems ("is the optimal termination value = 1?")
for OC-MDPs (and OC-SSGs) and showed them to be decidable in P-time (in NP and
coNP, respectively). However, quantitative decision and approximation problems
("is the optimal termination value ? p", or "approximate the termination value
within epsilon") are far more challenging. This is so in part because optimal
strategies may not exist, and because even when they do exist they can have a
highly non-trivial structure. It thus remained open even whether any of these
quantitative termination problems are computable. In this paper we show that
all quantitative approximation problems for the termination value for OC-MDPs
and OC-SSGs are computable. Specifically, given a OC-SSG, and given epsilon >
0, we can compute a value v that approximates the value of the OC-SSG
termination game within additive error epsilon, and furthermore we can compute
epsilon-optimal strategies for both players in the game. A key ingredient in
our proofs is a subtle martingale, derived from solving certain LPs that we can
associate with a maximizing OC-MDP. An application of Azuma's inequality on
these martingales yields a computable bound for the "wealth" at which a "rich
person's strategy" becomes epsilon-optimal for OC-MDPs.Comment: 35 pages, 1 figure, full version of a paper presented at ICALP 2011,
invited for submission to Information and Computatio
Markov Decision Processes with Multiple Long-run Average Objectives
We study Markov decision processes (MDPs) with multiple limit-average (or
mean-payoff) functions. We consider two different objectives, namely,
expectation and satisfaction objectives. Given an MDP with k limit-average
functions, in the expectation objective the goal is to maximize the expected
limit-average value, and in the satisfaction objective the goal is to maximize
the probability of runs such that the limit-average value stays above a given
vector. We show that under the expectation objective, in contrast to the case
of one limit-average function, both randomization and memory are necessary for
strategies even for epsilon-approximation, and that finite-memory randomized
strategies are sufficient for achieving Pareto optimal values. Under the
satisfaction objective, in contrast to the case of one limit-average function,
infinite memory is necessary for strategies achieving a specific value (i.e.
randomized finite-memory strategies are not sufficient), whereas memoryless
randomized strategies are sufficient for epsilon-approximation, for all
epsilon>0. We further prove that the decision problems for both expectation and
satisfaction objectives can be solved in polynomial time and the trade-off
curve (Pareto curve) can be epsilon-approximated in time polynomial in the size
of the MDP and 1/epsilon, and exponential in the number of limit-average
functions, for all epsilon>0. Our analysis also reveals flaws in previous work
for MDPs with multiple mean-payoff functions under the expectation objective,
corrects the flaws, and allows us to obtain improved results
Qualitative Reachability in Stochastic BPA Games
We consider a class of infinite-state stochastic games generated by stateless
pushdown automata (or, equivalently, 1-exit recursive state machines), where
the winning objective is specified by a regular set of target configurations
and a qualitative probability constraint `>0' or `=1'. The goal of one player
is to maximize the probability of reaching the target set so that the
constraint is satisfied, while the other player aims at the opposite. We show
that the winner in such games can be determined in PTIME for the `>0'
constraint, and both in NP and coNP for the `=1' constraint. Further, we prove
that the winning regions for both players are regular, and we design algorithms
which compute the associated finite-state automata. Finally, we show that
winning strategies can be synthesized effectively.Comment: Submitted to Information and Computation. 48 pages, 3 figure
Hyperplane Separation Technique for Multidimensional Mean-Payoff Games
We consider both finite-state game graphs and recursive game graphs (or
pushdown game graphs), that can model the control flow of sequential programs
with recursion, with multi-dimensional mean-payoff objectives. In pushdown
games two types of strategies are relevant: global strategies, that depend on
the entire global history; and modular strategies, that have only local memory
and thus do not depend on the context of invocation. We present solutions to
several fundamental algorithmic questions and our main contributions are as
follows: (1) We show that finite-state multi-dimensional mean-payoff games can
be solved in polynomial time if the number of dimensions and the maximal
absolute value of the weight is fixed; whereas if the number of dimensions is
arbitrary, then problem is already known to be coNP-complete. (2) We show that
pushdown graphs with multi-dimensional mean-payoff objectives can be solved in
polynomial time. (3) For pushdown games under global strategies both single and
multi-dimensional mean-payoff objectives problems are known to be undecidable,
and we show that under modular strategies the multi-dimensional problem is also
undecidable (whereas under modular strategies the single dimensional problem is
NP-complete). We show that if the number of modules, the number of exits, and
the maximal absolute value of the weight is fixed, then pushdown games under
modular strategies with single dimensional mean-payoff objectives can be solved
in polynomial time, and if either of the number of exits or the number of
modules is not bounded, then the problem is NP-hard. (4) Finally we show that a
fixed parameter tractable algorithm for finite-state multi-dimensional
mean-payoff games or pushdown games under modular strategies with
single-dimensional mean-payoff objectives would imply the solution of the
long-standing open problem of fixed parameter tractability of parity games.Comment: arXiv admin note: text overlap with arXiv:1201.282
Polynomial Time Algorithms for Branching Markov Decision Processes and Probabilistic Min(Max) Polynomial Bellman Equations
We show that one can approximate the least fixed point solution for a
multivariate system of monotone probabilistic max(min) polynomial equations,
referred to as maxPPSs (and minPPSs, respectively), in time polynomial in both
the encoding size of the system of equations and in log(1/epsilon), where
epsilon > 0 is the desired additive error bound of the solution. (The model of
computation is the standard Turing machine model.) We establish this result
using a generalization of Newton's method which applies to maxPPSs and minPPSs,
even though the underlying functions are only piecewise-differentiable. This
generalizes our recent work which provided a P-time algorithm for purely
probabilistic PPSs.
These equations form the Bellman optimality equations for several important
classes of infinite-state Markov Decision Processes (MDPs). Thus, as a
corollary, we obtain the first polynomial time algorithms for computing to
within arbitrary desired precision the optimal value vector for several classes
of infinite-state MDPs which arise as extensions of classic, and heavily
studied, purely stochastic processes. These include both the problem of
maximizing and mininizing the termination (extinction) probability of
multi-type branching MDPs, stochastic context-free MDPs, and 1-exit Recursive
MDPs.
Furthermore, we also show that we can compute in P-time an epsilon-optimal
policy for both maximizing and minimizing branching, context-free, and
1-exit-Recursive MDPs, for any given desired epsilon > 0. This is despite the
fact that actually computing optimal strategies is Sqrt-Sum-hard and
PosSLP-hard in this setting.
We also derive, as an easy consequence of these results, an FNP upper bound
on the complexity of computing the value (within arbitrary desired precision)
of branching simple stochastic games (BSSGs)