252 research outputs found

    A distance for probability spaces, and long-term values in Markov Decision Processes and Repeated Games

    Full text link
    Given a finite set KK, we denote by X=Δ(K)X=\Delta(K) the set of probabilities on KK and by Z=Δf(X)Z=\Delta_f(X) the set of Borel probabilities on XX with finite support. Studying a Markov Decision Process with partial information on KK naturally leads to a Markov Decision Process with full information on XX. We introduce a new metric dd_* on ZZ such that the transitions become 1-Lipschitz from (X,.1)(X, \|.\|_1) to (Z,d)(Z,d_*). In the first part of the article, we define and prove several properties of the metric dd_*. Especially, dd_* satisfies a Kantorovich-Rubinstein type duality formula and can be characterized by using disintegrations. In the second part, we characterize the limit values in several classes of "compact non expansive" Markov Decision Processes. In particular we use the metric dd_* to characterize the limit value in Partial Observation MDP with finitely many states and in Repeated Games with an informed controller with finite sets of states and actions. Moreover in each case we can prove the existence of a generalized notion of uniform value where we consider not only the Ces\`aro mean when the number of stages is large enough but any evaluation function θΔ(N)\theta \in \Delta(\N^*) when the impatience I(θ)=t1θt+1θtI(\theta)=\sum_{t\geq 1} |\theta_{t+1}-\theta_t| is small enough

    Recursive games: Uniform value, Tauberian theorem and the Mertens conjecture "Maxmin=limvn=limvλMaxmin=\lim v_n=\lim v_\lambda"

    Full text link
    We study two-player zero-sum recursive games with a countable state space and finite action spaces at each state. When the family of nn-stage values {vn,n1}\{v_n,n\geq 1\} is totally bounded for the uniform norm, we prove the existence of the uniform value. Together with a result in Rosenberg and Vieille (2000), we obtain a uniform Tauberian theorem for recursive games: (vn)(v_n) converges uniformly if and only if (vλ)(v_\lambda) converges uniformly. We apply our main result to finite recursive games with signals (where players observe only signals on the state and on past actions). When the maximizer is more informed than the minimizer, we prove the Mertens conjecture Maxmin=limnvn=limλ0vλMaxmin=\lim_{n\to\infty} v_n=\lim_{\lambda\to 0}v_\lambda. Finally, we deduce the existence of the uniform value in finite recursive game with symmetric information.Comment: 32 page

    A posteriori error estimates for a finite element approximation of transmission problems with sign changing coefficients

    Get PDF
    We perform the a posteriori error analysis of residual type of a transmission problem with sign changing coefficients. According to [6] if the contrast is large enough, the continuous problem can be transformed into a coercive one. We further show that a similar property holds for the discrete problem for any regular meshes, extending the framework from [6]. The reliability and efficiency of the proposed estimator is confirmed by some numerical tests.Comment: 15 page

    Sweeping process by prox-regular sets in Riemannian Hilbert manifolds

    Get PDF
    In this paper, we deal with sweeping processes on (possibly infinite-dimensional) Riemannian Hilbert manifolds. We extend the useful notions (proximal normal cone, prox-regularity) already defined in the setting of a Hilbert space to the framework of such manifolds. Especially we introduce the concept of local prox-regularity of a closed subset in accordance with the geometrical features of the ambient manifold and we check that this regularity implies a property of hypomonotonicity for the proximal normal cone. Moreover we show that the metric projection onto a locally prox-regular set is single-valued in its neighborhood. Then under some assumptions, we prove the well-posedness of perturbed sweeping processes by locally prox-regular sets.Comment: 27 page

    A discrete contact model for crowd motion

    Get PDF
    The aim of this paper is to develop a crowd motion model designed to handle highly packed situations. The model we propose rests on two principles: We first define a spontaneous velocity which corresponds to the velocity each individual would like to have in the absence of other people; The actual velocity is then computed as the projection of the spontaneous velocity onto the set of admissible velocities (i.e. velocities which do not violate the non-overlapping constraint). We describe here the underlying mathematical framework, and we explain how recent results by J.F. Edmond and L. Thibault on the sweeping process by uniformly prox-regular sets can be adapted to handle this situation in terms of well-posedness. We propose a numerical scheme for this contact dynamics model, based on a prediction-correction algorithm. Numerical illustrations are finally presented and discussed.Comment: 22 page

    Existence of solutions for second-order differential inclusions involving proximal normal cones

    Full text link
    In this work, we prove global existence of solutions for second order differential problems in a general framework. More precisely, we consider second order differential inclusions involving proximal normal cone to a set-valued map. This set-valued map is supposed to take admissible values (so in particular uniformly prox-regular values, which may be non-smooth and non-convex). Moreover we require the solution to satisfy an impact law, appearing in the description of mechanical systems with inelastic shocks.Comment: 37 page

    Asymptotic Properties of Optimal Trajectories in Dynamic Programming

    Full text link
    We prove in a dynamic programming framework that uniform convergence of the finite horizon values implies that asymptotically the average accumulated payoff is constant on optimal trajectories. We analyze and discuss several possible extensions to two-person games.Comment: 9 page

    Existence de la valeur uniforme dans les jeux répétés

    Get PDF
    Dans cette thèse, nous nous intéressons à un modèle général de jeux répétés à deux joueurs et à somme nulle et en particulier au problème de l’existence de la valeur uniforme. Un jeu répété a une valeur uniforme s’il existe un paiement que les deux joueurs peuvent garantir, dans tous les jeux commençant aujourd’hui et suffisamment longs, indépendamment de la longueur du jeu. Dans un premier chapitre, on étudie les cas d’un seul joueur, appelé processus de décision Markovien partiellement observable, et des jeux où un joueur est parfaitement informé et contrôle la transition. Il est connu que ces jeux admettent une valeur uniforme. En introduisant une nouvelle distance sur les probabilités sur le simplexe de Rm, on montre l’existence d’une notion plus forte où les joueurs garantissent le même paiement sur n’importe quel intervalle de temps suffisamment long et non pas uniquement sur ceux commençant aujourd’hui. Dans les deux chapitres suivants, on montre l’existence de la valeur uniforme dans deux cas particuliers de jeux répétés : les jeux commutatifs dans le noir, où les joueurs n’observent pas l'état mais l’état est indépendant de l’ordre dans lequel les actions sont jouées, et les jeux avec un contrôleur plus informé, où un joueur est plus informé que l’autre joueur et contrôle l'évolution de l'état. Dans le dernier chapitre, on étudie le lien entre la convergence uniforme des valeurs des jeux en n étapes et le comportement asymptotique des stratégies optimales dans ces jeux en n étapes. Pour chaque n, on considère le paiement garanti pendant ln étapes avec 0 < l < 1 par les stratégies optimales pour n étapes et le comportement asymptotique lorsque n tend vers l’infini.In this dissertation, we consider a general model of two-player zero-sum repeated game and particularly the problem of the existence of a uniform value. A repeated game has a uniform value if both players can guarantee the same payoff in all games beginning today and sufficiently long, independently of the length of the game. In a first chapter, we focus on the cases of one player, called Partial Observation Markov Decision Processes, and of Repeated Games where one player is perfectly informed and controls the transitions. It is known that these games have a uniform value. By introducing a new metric on the probabilities over a simplex in Rm, we show the existence of a stronger notion, where the players guarantee the same payoff on all sufficiently long intervals of stages and not uniquely on the one starting today. In the next two chapters, we show the existence of the uniform value in two special models of repeated games : commutative repeated games in the dark, where the players do not observe the state variable, but the state is independent of the order the actions are played, and repeated games with a more informed controller, where one player controls the transition and has more information than the second player. In the last chapter, we study the link between the uniform convergence of the value of the n-stage games and the asymptotic behavior of the sequence of optimal strategies in the n-stage game. For each n, we consider n-stage optimal strategies and the payoff they are guaranteeing during the ln first stages with 0 < l < 1. We study the asymptotic of this payoff when n goes to infinity

    Existence de la valeur uniforme dans les jeux répétés

    Get PDF
    Dans cette thèse, nous nous intéressons à un modèle général de jeux répétés à deux joueurs et à somme nulle et en particulier au problème de l’existence de la valeur uniforme. Un jeu répété a une valeur uniforme s’il existe un paiement que les deux joueurs peuvent garantir, dans tous les jeux commençant aujourd’hui et suffisamment longs, indépendamment de la longueur du jeu. Dans un premier chapitre, on étudie les cas d’un seul joueur, appelé processus de décision Markovien partiellement observable, et des jeux où un joueur est parfaitement informé et contrôle la transition. Il est connu que ces jeux admettent une valeur uniforme. En introduisant une nouvelle distance sur les probabilités sur le simplexe de Rm, on montre l’existence d’une notion plus forte où les joueurs garantissent le même paiement sur n’importe quel intervalle de temps suffisamment long et non pas uniquement sur ceux commençant aujourd’hui. Dans les deux chapitres suivants, on montre l’existence de la valeur uniforme dans deux cas particuliers de jeux répétés : les jeux commutatifs dans le noir, où les joueurs n’observent pas l'état mais l’état est indépendant de l’ordre dans lequel les actions sont jouées, et les jeux avec un contrôleur plus informé, où un joueur est plus informé que l’autre joueur et contrôle l'évolution de l'état. Dans le dernier chapitre, on étudie le lien entre la convergence uniforme des valeurs des jeux en n étapes et le comportement asymptotique des stratégies optimales dans ces jeux en n étapes. Pour chaque n, on considère le paiement garanti pendant ln étapes avec 0 < l < 1 par les stratégies optimales pour n étapes et le comportement asymptotique lorsque n tend vers l’infini.In this dissertation, we consider a general model of two-player zero-sum repeated game and particularly the problem of the existence of a uniform value. A repeated game has a uniform value if both players can guarantee the same payoff in all games beginning today and sufficiently long, independently of the length of the game. In a first chapter, we focus on the cases of one player, called Partial Observation Markov Decision Processes, and of Repeated Games where one player is perfectly informed and controls the transitions. It is known that these games have a uniform value. By introducing a new metric on the probabilities over a simplex in Rm, we show the existence of a stronger notion, where the players guarantee the same payoff on all sufficiently long intervals of stages and not uniquely on the one starting today. In the next two chapters, we show the existence of the uniform value in two special models of repeated games : commutative repeated games in the dark, where the players do not observe the state variable, but the state is independent of the order the actions are played, and repeated games with a more informed controller, where one player controls the transition and has more information than the second player. In the last chapter, we study the link between the uniform convergence of the value of the n-stage games and the asymptotic behavior of the sequence of optimal strategies in the n-stage game. For each n, we consider n-stage optimal strategies and the payoff they are guaranteeing during the ln first stages with 0 < l < 1. We study the asymptotic of this payoff when n goes to infinity
    corecore