228 research outputs found
General limit value in Dynamic Programming
We consider a dynamic programming problem with arbitrary state space and
bounded rewards. Is it possible to define in an unique way a limit value for
the problem, where the "patience" of the decision-maker tends to infinity ? We
consider, for each evaluation (a probability distribution over
positive integers) the value function of the problem where the
weight of any stage is given by , and we investigate the uniform
convergence of a sequence when the "impatience" of the
evaluations vanishes, in the sense that . We prove that this uniform convergence happens
if and only if the metric space is totally bounded.
Moreover there exists a particular function , independent of the
particular chosen sequence , such that any limit point of such
sequence of value functions is precisely . Consequently, while speaking of
uniform convergence of the value functions, may be considered as the
unique possible limit when the patience of the decision-maker tends to
infinity. The result applies in particular to discounted payoffs when the
discount factor vanishes, as well as to average payoffs where the number of
stages goes to infinity, and also to models with stochastic transitions. We
present tractable corollaries, and we discuss counterexamples and a conjecture
The value of Repeated Games with an informed controller
We consider the general model of zero-sum repeated games (or stochastic games
with signals), and assume that one of the players is fully informed and
controls the transitions of the state variable. We prove the existence of the
uniform value, generalizing several results of the literature. A preliminary
existence result is obtained for a certain class of stochastic games played
with pure strategies
A distance for probability spaces, and long-term values in Markov Decision Processes and Repeated Games
Given a finite set , we denote by the set of probabilities
on and by the set of Borel probabilities on with finite
support. Studying a Markov Decision Process with partial information on
naturally leads to a Markov Decision Process with full information on . We
introduce a new metric on such that the transitions become
1-Lipschitz from to . In the first part of the article,
we define and prove several properties of the metric . Especially,
satisfies a Kantorovich-Rubinstein type duality formula and can be
characterized by using disintegrations. In the second part, we characterize the
limit values in several classes of "compact non expansive" Markov Decision
Processes. In particular we use the metric to characterize the limit
value in Partial Observation MDP with finitely many states and in Repeated
Games with an informed controller with finite sets of states and actions.
Moreover in each case we can prove the existence of a generalized notion of
uniform value where we consider not only the Ces\`aro mean when the number of
stages is large enough but any evaluation function
when the impatience is small
enough
Probabilistic Reliability and Privacy of Communication Using Multicast in General Neighbor Networks .
- …