5,853 research outputs found
Orderfield property of mixtures of stochastic games
We consider certain mixtures, Γ, of classes of stochastic games and provide sufficient conditions for these mixtures to possess the orderfield property. For 2-player zero-sum and non-zero sum stochastic games, we prove that if we mix a set of states S1 where the transitions are controlled by one player with a set of states S2 constituting a sub-game having the orderfield property (where S1∩S2=∅), the resulting mixture Γ with states S=S1∪S2 has the orderfield property if there are no transitions from S2 to S1. This is true for discounted as well as undiscounted games. This condition on the transitions is sufficient when S1 is perfect information or SC (Switching Control) or ARAT (Additive Reward Additive Transition). In the zero-sum case, S1 can be a mixture of SC and ARAT as well. On the other hand,when S1 is SER-SIT (Separable Reward - State Independent Transition), we provide a counter example to show that this condition is not sufficient for the mixture Γ to possess the orderfield property. In addition to the condition that there are no transitions from S2 to S1, if the sum of all transition probabilities from S1 to S2 is independent of the actions of the players, then Γ has the orderfield property even when S1 is SER-SIT. When S1 and S2 are both SERSIT, their mixture Γ has the orderfield property even if we allow transitions from S2 to S1. We also extend these results to some multi-player games namely, mixtures with one player control Polystochastic games. In all the above cases, we can inductively mix many such games and continue to retain the orderfield property
Recursive Concurrent Stochastic Games
We study Recursive Concurrent Stochastic Games (RCSGs), extending our recent
analysis of recursive simple stochastic games to a concurrent setting where the
two players choose moves simultaneously and independently at each state. For
multi-exit games, our earlier work already showed undecidability for basic
questions like termination, thus we focus on the important case of single-exit
RCSGs (1-RCSGs).
We first characterize the value of a 1-RCSG termination game as the least
fixed point solution of a system of nonlinear minimax functional equations, and
use it to show PSPACE decidability for the quantitative termination problem. We
then give a strategy improvement technique, which we use to show that player 1
(maximizer) has \epsilon-optimal randomized Stackless & Memoryless (r-SM)
strategies for all \epsilon > 0, while player 2 (minimizer) has optimal r-SM
strategies. Thus, such games are r-SM-determined. These results mirror and
generalize in a strong sense the randomized memoryless determinacy results for
finite stochastic games, and extend the classic Hoffman-Karp strategy
improvement approach from the finite to an infinite state setting. The proofs
in our infinite-state setting are very different however, relying on subtle
analytic properties of certain power series that arise from studying 1-RCSGs.
We show that our upper bounds, even for qualitative (probability 1)
termination, can not be improved, even to NP, without a major breakthrough, by
giving two reductions: first a P-time reduction from the long-standing
square-root sum problem to the quantitative termination decision problem for
finite concurrent stochastic games, and then a P-time reduction from the latter
problem to the qualitative termination problem for 1-RCSGs.Comment: 21 pages, 2 figure
On the dynamics of social conflicts: looking for the Black Swan
This paper deals with the modeling of social competition, possibly resulting
in the onset of extreme conflicts. More precisely, we discuss models describing
the interplay between individual competition for wealth distribution that, when
coupled with political stances coming from support or opposition to a
government, may give rise to strongly self-enhanced effects. The latter may be
thought of as the early stages of massive, unpredictable events known as Black
Swans, although no analysis of any fully-developed Black Swan is provided here.
Our approach makes use of the framework of the kinetic theory for active
particles, where nonlinear interactions among subjects are modeled according to
game-theoretical tools.Comment: 26 pages, 7 figure
Log-Distributional Approach for Learning Covariate Shift Ratios
Distributional Reinforcement Learning theory suggests that distributional fixed points could play a fundamental role to learning non additive value functions. In particular, we propose a distributional approach for learning Covariate Shift Ratios, whose update rule is originally multiplicative
Approximating the Termination Value of One-Counter MDPs and Stochastic Games
One-counter MDPs (OC-MDPs) and one-counter simple stochastic games (OC-SSGs)
are 1-player, and 2-player turn-based zero-sum, stochastic games played on the
transition graph of classic one-counter automata (equivalently, pushdown
automata with a 1-letter stack alphabet). A key objective for the analysis and
verification of these games is the termination objective, where the players aim
to maximize (minimize, respectively) the probability of hitting counter value
0, starting at a given control state and given counter value. Recently, we
studied qualitative decision problems ("is the optimal termination value = 1?")
for OC-MDPs (and OC-SSGs) and showed them to be decidable in P-time (in NP and
coNP, respectively). However, quantitative decision and approximation problems
("is the optimal termination value ? p", or "approximate the termination value
within epsilon") are far more challenging. This is so in part because optimal
strategies may not exist, and because even when they do exist they can have a
highly non-trivial structure. It thus remained open even whether any of these
quantitative termination problems are computable. In this paper we show that
all quantitative approximation problems for the termination value for OC-MDPs
and OC-SSGs are computable. Specifically, given a OC-SSG, and given epsilon >
0, we can compute a value v that approximates the value of the OC-SSG
termination game within additive error epsilon, and furthermore we can compute
epsilon-optimal strategies for both players in the game. A key ingredient in
our proofs is a subtle martingale, derived from solving certain LPs that we can
associate with a maximizing OC-MDP. An application of Azuma's inequality on
these martingales yields a computable bound for the "wealth" at which a "rich
person's strategy" becomes epsilon-optimal for OC-MDPs.Comment: 35 pages, 1 figure, full version of a paper presented at ICALP 2011,
invited for submission to Information and Computatio
- …