9 research outputs found

    A further remark on dynamic programming for partially observed markov processes

    Get PDF
    In (Stochastic Process. Appl. 103 (2003) 293), a pair of dynamic programming inequalities were derived for the 'separated'ergodic control problem for partially observed Markov processes, using the 'vanishing discount'argument. In this note, we strengthen these results to derive a single dynamic programming equation for the same

    Average cost dynamic programming equations for controlled Markov chains with partial observations

    No full text
    The value function for the average cost control of a class of partially observed Markov chains is derived as the "vanishing discount limit," in a suitable sense, of the value functions for the corresponding discounted cost problems. The limiting procedure is justified by bounds derived using a simple coupling argument

    Long-Term Values in Markov Decision Processes and Repeated Games, and a New Distance for Probability Spaces

    Get PDF
    We study long-term Markov decision processes (MDPs) and gambling houses, with applications to any partial observation MDPs with finitely many states and zero-sum repeated games with an informed controller. We consider a decision maker who is maximizing the weighted sum 11t 65 1 trt, where rt is the expected reward of the t-th stage. We prove the existence of a very strong notion of long-term value called general uniform value, representing the fact that the decision maker can play well independently of the evaluations (t)t 65 1 over stages, provided the total variation (or impatience) 11t 651 23 23\u3b8t+1 12\u3b8t 23 23 is small enough. This result generalizes previous results of the literature that focus on arithmetic means and discounted evaluations. Moreover, we give a variational characterization of the general uniform value via the introduction of appropriate invariant measures for the decision problems, generalizing the fundamental theorem of gambling or the Aumann\u2013Maschler cav(u) formula for repeated games with incomplete information. Apart the introduction of appropriate invariant measures, the main innovation in our proofs is the introduction of a new metric, d*, such that partial observation MDPs and repeated games with an informed controller may be associated with auxiliary problems that are nonexpansive with respect to d*. Given two Borel probabilities over a compact subset X of a normed vector space, we define d 17(u,v)=supf 08D1 23 23u(f) 12v(f) 23 23, where D1 is the set of functions satisfying 00 x, y 08 X, 00 a, b 65 0, af(x) 12 bf(y) 64 \u2016ax 12 by\u2016. The particular case where X is a simplex endowed with the L1-norm is particularly interesting: d* is the largest distance over the probabilities with finite support over X, which makes every disintegration nonexpansive. Moreover, we obtain a Kantorovich\u2013Rubinstein-type duality formula for d*(u, v), involving couples of measures (\u3b1, \u3b2) over X 7 X such that the first marginal of \u3b1 is u and the second marginal of \u3b2 is v

    Existence de la valeur uniforme dans les jeux répétés

    Get PDF
    Dans cette thèse, nous nous intéressons à un modèle général de jeux répétés à deux joueurs et à somme nulle et en particulier au problème de l’existence de la valeur uniforme. Un jeu répété a une valeur uniforme s’il existe un paiement que les deux joueurs peuvent garantir, dans tous les jeux commençant aujourd’hui et suffisamment longs, indépendamment de la longueur du jeu. Dans un premier chapitre, on étudie les cas d’un seul joueur, appelé processus de décision Markovien partiellement observable, et des jeux où un joueur est parfaitement informé et contrôle la transition. Il est connu que ces jeux admettent une valeur uniforme. En introduisant une nouvelle distance sur les probabilités sur le simplexe de Rm, on montre l’existence d’une notion plus forte où les joueurs garantissent le même paiement sur n’importe quel intervalle de temps suffisamment long et non pas uniquement sur ceux commençant aujourd’hui. Dans les deux chapitres suivants, on montre l’existence de la valeur uniforme dans deux cas particuliers de jeux répétés : les jeux commutatifs dans le noir, où les joueurs n’observent pas l'état mais l’état est indépendant de l’ordre dans lequel les actions sont jouées, et les jeux avec un contrôleur plus informé, où un joueur est plus informé que l’autre joueur et contrôle l'évolution de l'état. Dans le dernier chapitre, on étudie le lien entre la convergence uniforme des valeurs des jeux en n étapes et le comportement asymptotique des stratégies optimales dans ces jeux en n étapes. Pour chaque n, on considère le paiement garanti pendant ln étapes avec 0 < l < 1 par les stratégies optimales pour n étapes et le comportement asymptotique lorsque n tend vers l’infini.In this dissertation, we consider a general model of two-player zero-sum repeated game and particularly the problem of the existence of a uniform value. A repeated game has a uniform value if both players can guarantee the same payoff in all games beginning today and sufficiently long, independently of the length of the game. In a first chapter, we focus on the cases of one player, called Partial Observation Markov Decision Processes, and of Repeated Games where one player is perfectly informed and controls the transitions. It is known that these games have a uniform value. By introducing a new metric on the probabilities over a simplex in Rm, we show the existence of a stronger notion, where the players guarantee the same payoff on all sufficiently long intervals of stages and not uniquely on the one starting today. In the next two chapters, we show the existence of the uniform value in two special models of repeated games : commutative repeated games in the dark, where the players do not observe the state variable, but the state is independent of the order the actions are played, and repeated games with a more informed controller, where one player controls the transition and has more information than the second player. In the last chapter, we study the link between the uniform convergence of the value of the n-stage games and the asymptotic behavior of the sequence of optimal strategies in the n-stage game. For each n, we consider n-stage optimal strategies and the payoff they are guaranteeing during the ln first stages with 0 < l < 1. We study the asymptotic of this payoff when n goes to infinity
    corecore