Search CORE

228 research outputs found

General limit value in Dynamic Programming

Author: Renault Jérôme
Publication venue
Publication date: 03/01/2013
Field of study

We consider a dynamic programming problem with arbitrary state space and bounded rewards. Is it possible to define in an unique way a limit value for the problem, where the "patience" of the decision-maker tends to infinity ? We consider, for each evaluation

\theta

(a probability distribution over positive integers) the value function

v_{\theta}

of the problem where the weight of any stage

t

is given by

\theta_t

, and we investigate the uniform convergence of a sequence

(v_{\theta^k})_k

when the "impatience" of the evaluations vanishes, in the sense that

\sum_{t} |\theta^k_{t}-\theta^k_{t+1}| \rightarrow_{k \to \infty} 0

. We prove that this uniform convergence happens if and only if the metric space

{v_{\theta^k}, k\geq 1}

is totally bounded. Moreover there exists a particular function

v^*

, independent of the particular chosen sequence

({\theta^k})_k

, such that any limit point of such sequence of value functions is precisely

v^*

. Consequently, while speaking of uniform convergence of the value functions,

v^*

may be considered as the unique possible limit when the patience of the decision-maker tends to infinity. The result applies in particular to discounted payoffs when the discount factor vanishes, as well as to average payoffs where the number of stages goes to infinity, and also to models with stochastic transitions. We present tractable corollaries, and we discuss counterexamples and a conjecture

arXiv.org e-Print Archive

Toulouse Capitole Publications

Toulouse 1 Capitole Publications

The value of Repeated Games with an informed controller

Author: Renault Jérôme
Publication venue
Publication date: 21/03/2008
Field of study

We consider the general model of zero-sum repeated games (or stochastic games with signals), and assume that one of the players is fully informed and controls the transitions of the state variable. We prove the existence of the uniform value, generalizing several results of the literature. A preliminary existence result is obtained for a certain class of stochastic games played with pure strategies

arXiv.org e-Print Archive

Base de publications de l'université Paris-Dauphine

HAL-Polytechnique

Computational Complexity, Theory, Techniques, and Applications.

Author: Renault Jérôme
Publication venue
Publication date
Field of study

Research Papers in Economics

The Value of Markov Chain Games with Lack of Information on One Side .

Author: Renault Jérôme
Publication venue
Publication date
Field of study

Research Papers in Economics

A distance for probability spaces, and long-term values in Markov Decision Processes and Repeated Games

Author: Renault Jérôme
Venel Xavier
Publication venue
Publication date: 28/02/2012
Field of study

Given a finite set

K

, we denote by

X=\Delta(K)

the set of probabilities on

K

and by

Z=\Delta_f(X)

the set of Borel probabilities on

X

with finite support. Studying a Markov Decision Process with partial information on

K

naturally leads to a Markov Decision Process with full information on

X

. We introduce a new metric

d_*

Z

such that the transitions become 1-Lipschitz from

(X, \|.\|_1)

(Z,d_*)

. In the first part of the article, we define and prove several properties of the metric

d_*

. Especially,

d_*

satisfies a Kantorovich-Rubinstein type duality formula and can be characterized by using disintegrations. In the second part, we characterize the limit values in several classes of "compact non expansive" Markov Decision Processes. In particular we use the metric

d_*

to characterize the limit value in Partial Observation MDP with finitely many states and in Repeated Games with an informed controller with finite sets of states and actions. Moreover in each case we can prove the existence of a generalized notion of uniform value where we consider not only the Ces\`aro mean when the number of stages is large enough but any evaluation function

\theta \in \Delta(\N^*)

when the impatience

I(\theta)=\sum_{t\geq 1} |\theta_{t+1}-\theta_t|

is small enough

arXiv.org e-Print Archive

CiteSeerX

Probabilistic Reliability and Privacy of Communication Using Multicast in General Neighbor Networks .

Author: Renault Jérôme
Tomala Tristan
Publication venue
Publication date
Field of study

Research Papers in Economics