11 research outputs found
Parity Objectives in Countable MDPs
We study countably infinite MDPs with parity objectives, and special cases with a bounded number of colors in the Mostowski hierarchy (including reachability, safety, BĂŒchi and co-BĂŒchi). In finite MDPs there always exist optimal memoryless deterministic (MD) strategies for parity objectives, but this does not generally hold for countably infinite MDPs. In particular, optimal strategies need not exist. For countable infinite MDPs, we provide a complete picture of the memory requirements of optimal (resp., c-optimal) strategies for all objectives in the Mostowski hierarchy. In particular, there is a strong dichotomy between two different types of objectives. For the first type, optimal strategies, if they exist, can be chosen MD, while for the second type optimal strategies require infinite memory. (I.e., for all objectives in the Mostowski hierarchy, if finite-memory randomized strategies suffice then also MD-strategies suffice.) Similarly, some objectives admit c-optimal MD-strategies, while for others c-optimal strategies require infinite memory. Such a dichotomy also holds for the subclass of countably infinite MDPs that are finitely branching, though more objectives admit MD-strategies here
Strategy Complexity of Parity Objectives in Countable MDPs
We study countably infinite MDPs with parity objectives. Unlike in finite
MDPs, optimal strategies need not exist, and may require infinite memory if
they do. We provide a complete picture of the exact strategy complexity of
-optimal strategies (and optimal strategies, where they exist) for
all subclasses of parity objectives in the Mostowski hierarchy. Either
MD-strategies, Markov strategies, or 1-bit Markov strategies are necessary and
sufficient, depending on the number of colors, the branching degree of the MDP,
and whether one considers -optimal or optimal strategies. In
particular, 1-bit Markov strategies are necessary and sufficient for
-optimal (resp. optimal) strategies for general parity objectives.Comment: This is the full version of a paper presented at CONCUR 202
Strategy Complexity of Threshold Payoff with Applications to Optimal Expected Payoff
We study countably infinite Markov decision processes (MDPs) with transition
rewards. The (resp. ) threshold objective is to maximize the
probability that the (resp. ) of the infinite sequence of
transition rewards is non-negative. We establish the complete picture of the
strategy complexity of these objectives, i.e., the upper and lower bounds on
the memory required by -optimal (resp. optimal) strategies. We
then apply these results to solve two open problems from [Sudderth, Decisions
in Economics and Finance, 2020] about the strategy complexity of optimal
strategies for the expected (resp. ) payoff.Comment: 53 page
Taming denumerable Markov decision processes with decisiveness
Decisiveness has proven to be an elegant concept for denumerable Markov
chains: it is general enough to encompass several natural classes of
denumerable Markov chains, and is a sufficient condition for simple qualitative
and approximate quantitative model checking algorithms to exist. In this paper,
we explore how to extend the notion of decisiveness to Markov decision
processes. Compared to Markov chains, the extra non-determinism can be resolved
in an adversarial or cooperative way, yielding two natural notions of
decisiveness. We then explore whether these notions yield model checking
procedures concerning the infimum and supremum probabilities of reachability
properties
Strategy Complexity of Point Payoff, Mean Payoff and Total Payoff Objectives in Countable MDPs
We study countably infinite Markov decision processes (MDPs) with real-valued
transition rewards. Every infinite run induces the following sequences of
payoffs: 1. Point payoff (the sequence of directly seen transition rewards), 2.
Mean payoff (the sequence of the sums of all rewards so far, divided by the
number of steps), and 3. Total payoff (the sequence of the sums of all rewards
so far). For each payoff type, the objective is to maximize the probability
that the is non-negative. We establish the complete picture of the
strategy complexity of these objectives, i.e., how much memory is necessary and
sufficient for -optimal (resp. optimal) strategies. Some cases can
be won with memoryless deterministic strategies, while others require a step
counter, a reward counter, or both.Comment: Revised and extended journal version of results presented at the
CONCUR 2021 conference. For a special issue in the arxiv overlay journal LMCS
(https://lmcs.episciences.org). This is not a duplicate of arXiv:2107.03287
(the conference version), but the significantly changed journal version for
LMCS (which uses arXiv as a backend
Strategy Complexity of Reachability in Countable Stochastic 2-Player Games
We study countably infinite stochastic 2-player games with reachability
objectives. Our results provide a complete picture of the memory requirements
of -optimal (resp. optimal) strategies. These results depend on
the size of the players' action sets and on whether one requires strategies
that are uniform (i.e., independent of the start state).
Our main result is that -optimal (resp. optimal) Maximizer
strategies require infinite memory if Minimizer is allowed infinite action
sets. This lower bound holds even under very strong restrictions. Even in the
special case of infinitely branching turn-based reachability games, even if all
states allow an almost surely winning Maximizer strategy, strategies with a
step counter plus finite private memory are still useless.
Regarding uniformity, we show that for Maximizer there need not exist
positional (i.e., memoryless) uniformly -optimal strategies even
in the special case of finite action sets or in finitely branching turn-based
games. On the other hand, in games with finite action sets, there always exists
a uniformly -optimal Maximizer strategy that uses just one bit of
public memory
Strategy Complexity of Point Payoff, Mean Payoff and Total Payoff Objectives in Countable MDPs
We study countably infinite Markov decision processes (MDPs) with real-valued
transition rewards. Every infinite run induces the following sequences of
payoffs: 1. Point payoff (the sequence of directly seen transition rewards), 2.
Mean payoff (the sequence of the sums of all rewards so far, divided by the
number of steps), and 3. Total payoff (the sequence of the sums of all rewards
so far). For each payoff type, the objective is to maximize the probability
that the is non-negative. We establish the complete picture of the
strategy complexity of these objectives, i.e., how much memory is necessary and
sufficient for -optimal (resp. optimal) strategies. Some cases can
be won with memoryless deterministic strategies, while others require a step
counter, a reward counter, or both
Strategy complexity of lim sup and lim inf payoff objectives in countable MDPs
We study countably infinite Markov decision processes (MDPs) with real-valued tran-
sition rewards. A strategy is a function which decides how plays proceed within the
MDP. Every strategy induces a set of infinite runs in the MDP and each infinite run
induces the following sequences of payoffs:
1. Point payoff (the sequence of directly seen transition rewards),
2. Mean payoff (the sequence of the sums of all rewards so far, divided by the number
of steps), and
3. Total payoff (the sequence of the sums of all rewards so far).
For each payoff type, the threshold objective is to maximise the probability that the
lim sup/lim inf is non-negative. We are interested in the strategy complexity of the
above objectives, i.e. the amount of memory and/or randomisation that a strategy
needs access to in order to play well (optimally resp. Δ-optimally). Our results seek
not only to decide whether an objective requires finite or infinite memory, but in the
case of infinite memory, what kind of infinite memory is necessary and sufficient. For
example, a step counter which acts as a clock, or a reward counter which sums up the
seen rewards may be sufficient.
We compare the lim sup/lim inf point payoff objectives to the BĂŒchi/co-BĂŒchi ob-
jectives which, given a set of states or transitions, seek to maximise the probability
that this set is visited infinitely/finitely often. Convergence effects are what differen-
tiate lim sup/lim inf point payoff objectives from BĂŒchi/co-BĂŒchi. For example, the
sequence â1/2, â1/3, â1/4 . . . does satisfy lim sup â„ 0 and lim inf â„ 0 despite all of
the rewards being negative. It is in dealing with these effects which we make our main
technical contributions. We establish a complete picture of the strategy complexity for
both the lim sup and lim inf point payoff objectives. In particular we show that opti-
mal lim sup requires either randomisation or access to a step counter and that lim inf
of point payoff requires a step counter (but not more) when the underlying MDP is
infinitely branching.
We also comprehensively pin down the strategy complexity for the lim inf total
and mean payoff objectives. This result requires a novel counterexample involving
unboundedly growing rewards as well as finely tuned transition probabilities which
force the player to use memory in order to mimic what occurred in past random events.
This allows us to show that both of these objectives require the use of both a step
counter as well as a reward counter.
We apply our results to solve two open problems from Sudderth [35] about the
strategy complexity of optimal strategies for the expected lim sup/lim inf point pay-
off. We achieve this by reducing each objective to its respective optimal threshold
lim sup/lim inf point payoff counterpart. Thus we are able to conclude that they share
the same optimal strategy complexity
Parity objectives in countable MDPs
We study countably infinite MDPs with parity objectives, and special cases with a bounded number of colors in the Mostowski hierarchy (including reachability, safety, BĂŒchi and co-BĂŒchi). In finite MDPs there always exist optimal memoryless deterministic (MD) strategies for parity objectives, but this does not generally hold for countably infiniteMDPs. In particular, optimal strategies need not exist. For countable infinite MDPs, we provide a complete picture of the memory requirements of optimal (resp., Ç«-optimal) strategies for all objectives in the Mostowski hierarchy. In particular, there is a strong dichotomy between two different types of objectives. For the first type, optimal strategies, if they exist, can be chosen MD, while for the second type optimal strategies require infinite memory. (I.e., for all objectives in the Mostowski hierarchy, if finite-memory randomized strategies suffice then also MD-strategies suffice.) Similarly, some objectives admit Ç«-optimal MD-strategies, while for others Ç«-optimal strategies require infinite memory. Such a dichotomy also holds for the subclass of countably infinite MDPs that are finitely branching, though more objectives admit MD-strategies here.</p