12,547 research outputs found

    Dynamic Non-Bayesian Decision Making

    Full text link
    The model of a non-Bayesian agent who faces a repeated game with incomplete information against Nature is an appropriate tool for modeling general agent-environment interactions. In such a model the environment state (controlled by Nature) may change arbitrarily, and the feedback/reward function is initially unknown. The agent is not Bayesian, that is he does not form a prior probability neither on the state selection strategy of Nature, nor on his reward function. A policy for the agent is a function which assigns an action to every history of observations and actions. Two basic feedback structures are considered. In one of them -- the perfect monitoring case -- the agent is able to observe the previous environment state as part of his feedback, while in the other -- the imperfect monitoring case -- all that is available to the agent is the reward obtained. Both of these settings refer to partially observable processes, where the current environment state is unknown. Our main result refers to the competitive ratio criterion in the perfect monitoring case. We prove the existence of an efficient stochastic policy that ensures that the competitive ratio is obtained at almost all stages with an arbitrarily high probability, where efficiency is measured in terms of rate of convergence. It is further shown that such an optimal policy does not exist in the imperfect monitoring case. Moreover, it is proved that in the perfect monitoring case there does not exist a deterministic policy that satisfies our long run optimality criterion. In addition, we discuss the maxmin criterion and prove that a deterministic efficient optimal strategy does exist in the imperfect monitoring case under this criterion. Finally we show that our approach to long-run optimality can be viewed as qualitative, which distinguishes it from previous work in this area.Comment: See http://www.jair.org/ for any accompanying file

    Stationarity and Chaos in Infinitely Repeated Games of Incomplete Information

    Get PDF
    Consider an incomplete information game in which the players first learn their own types, and then infinitely often play the same normal form game with the same opponents. After each play, the players observe their own payoff and the action of their opponents. The payoff for a strategy n-tuple in the infinitely repeated game is the discounted present value of the infinite stream of payoffs generated by the strategy. This paper studies Bayesian learning in such a setting. Kalai and Lehrer [1991] and Jordan [1991] have shown that Bayesian equilibria to such games exist and eventually look like Nash equilibria to the infinitely repeated full information game with the correct types. However, due to folk theorems for complete information games, this still leaves the class of equilibria for such games to be quite large. In order to refine the set of equilibria, we impose a restriction on the equilibrium strategies of the players which requires stationarity with respect to the profile of current beliefs: if the same profile of beliefs is reached at two different points in time, the players must choose the same behavioral strategy at both points in time. This set, called the belief stationary equilibria, is a subset of the Bayesian Nash equilibria. We compute a belief stationary equilibrium in an example. The equilibria that result can have elements of chaotic behavior. The equilibrium path of beliefs when types are not revealed can be chaotic, and small changes in initial beliefs can result in large changes in equilibrium actions

    Learning an Unknown Network State in Routing Games

    Full text link
    We study learning dynamics induced by myopic travelers who repeatedly play a routing game on a transportation network with an unknown state. The state impacts cost functions of one or more edges of the network. In each stage, travelers choose their routes according to Wardrop equilibrium based on public belief of the state. This belief is broadcast by an information system that observes the edge loads and realized costs on the used edges, and performs a Bayesian update to the prior stage's belief. We show that the sequence of public beliefs and edge load vectors generated by the repeated play converge almost surely. In any rest point, travelers have no incentive to deviate from the chosen routes and accurately learn the true costs on the used edges. However, the costs on edges that are not used may not be accurately learned. Thus, learning can be incomplete in that the edge load vectors at rest point and complete information equilibrium can be different. We present some conditions for complete learning and illustrate situations when such an outcome is not guaranteed

    Correlated equilibria and communication in games.

    Get PDF
    Analyse bayésienne; Théorie des jeux; Information privée;

    Reputation and perfection in repeated common interest games

    Get PDF
    We consider a wide class of repeated common interest games perturbed with one-sided incomplete information: one player (the informed player) might be a commitment type playing the Pareto dominant action. As discounting, which is assumed to be symmetric, and the prior probability of the commitment type go to zero, it is shown that the informed player can be held close to her minmax payoff even when perfection is imposed on the equilibrium
    corecore