4 research outputs found

    Long-Term Values in Markov Decision Processes and Repeated Games, and a New Distance for Probability Spaces

    Get PDF
    We study long-term Markov decision processes (MDPs) and gambling houses, with applications to any partial observation MDPs with finitely many states and zero-sum repeated games with an informed controller. We consider a decision maker who is maximizing the weighted sum 11t 65 1 trt, where rt is the expected reward of the t-th stage. We prove the existence of a very strong notion of long-term value called general uniform value, representing the fact that the decision maker can play well independently of the evaluations (t)t 65 1 over stages, provided the total variation (or impatience) 11t 651 23 23\u3b8t+1 12\u3b8t 23 23 is small enough. This result generalizes previous results of the literature that focus on arithmetic means and discounted evaluations. Moreover, we give a variational characterization of the general uniform value via the introduction of appropriate invariant measures for the decision problems, generalizing the fundamental theorem of gambling or the Aumann\u2013Maschler cav(u) formula for repeated games with incomplete information. Apart the introduction of appropriate invariant measures, the main innovation in our proofs is the introduction of a new metric, d*, such that partial observation MDPs and repeated games with an informed controller may be associated with auxiliary problems that are nonexpansive with respect to d*. Given two Borel probabilities over a compact subset X of a normed vector space, we define d 17(u,v)=supf 08D1 23 23u(f) 12v(f) 23 23, where D1 is the set of functions satisfying 00 x, y 08 X, 00 a, b 65 0, af(x) 12 bf(y) 64 \u2016ax 12 by\u2016. The particular case where X is a simplex endowed with the L1-norm is particularly interesting: d* is the largest distance over the probabilities with finite support over X, which makes every disintegration nonexpansive. Moreover, we obtain a Kantorovich\u2013Rubinstein-type duality formula for d*(u, v), involving couples of measures (\u3b1, \u3b2) over X 7 X such that the first marginal of \u3b1 is u and the second marginal of \u3b2 is v

    The Complexity of POMDPs with Long-run Average Objectives

    Full text link
    We study the problem of approximation of optimal values in partially-observable Markov decision processes (POMDPs) with long-run average objectives. POMDPs are a standard model for dynamic systems with probabilistic and nondeterministic behavior in uncertain environments. In long-run average objectives rewards are associated with every transition of the POMDP and the payoff is the long-run average of the rewards along the executions of the POMDP. We establish strategy complexity and computational complexity results. Our main result shows that finite-memory strategies suffice for approximation of optimal values, and the related decision problem is recursively enumerable complete

    Long information design

    Get PDF
    We analyze information design games between two designers with opposite preferences and a single agent. Before the agent makes a decision, designers repeatedly disclose public information about persistent state parameters. Disclosure continues until no designer wishes to reveal further information. We consider environments with general constraints on feasible information disclosure policies. Our main results characterize equilibrium payoffs and strategies of this long information design game and compare them with the equilibrium outcomes of games where designers move only at a single predetermined period. When information disclosure policies are unconstrained, we show that at equilibrium in the long game, information is revealed right away in a single period; otherwise, the number of periods in which information is disclosed might be unbounded. As an application, we study a competition in product demonstration and show that more information is revealed if each designer could disclose information at a predetermined period. The format that provides the buyer with most information is the sequential game where the last mover is the ex-ante favorite seller
    corecore