12,547 research outputs found
Dynamic Non-Bayesian Decision Making
The model of a non-Bayesian agent who faces a repeated game with incomplete
information against Nature is an appropriate tool for modeling general
agent-environment interactions. In such a model the environment state
(controlled by Nature) may change arbitrarily, and the feedback/reward function
is initially unknown. The agent is not Bayesian, that is he does not form a
prior probability neither on the state selection strategy of Nature, nor on his
reward function. A policy for the agent is a function which assigns an action
to every history of observations and actions. Two basic feedback structures are
considered. In one of them -- the perfect monitoring case -- the agent is able
to observe the previous environment state as part of his feedback, while in the
other -- the imperfect monitoring case -- all that is available to the agent is
the reward obtained. Both of these settings refer to partially observable
processes, where the current environment state is unknown. Our main result
refers to the competitive ratio criterion in the perfect monitoring case. We
prove the existence of an efficient stochastic policy that ensures that the
competitive ratio is obtained at almost all stages with an arbitrarily high
probability, where efficiency is measured in terms of rate of convergence. It
is further shown that such an optimal policy does not exist in the imperfect
monitoring case. Moreover, it is proved that in the perfect monitoring case
there does not exist a deterministic policy that satisfies our long run
optimality criterion. In addition, we discuss the maxmin criterion and prove
that a deterministic efficient optimal strategy does exist in the imperfect
monitoring case under this criterion. Finally we show that our approach to
long-run optimality can be viewed as qualitative, which distinguishes it from
previous work in this area.Comment: See http://www.jair.org/ for any accompanying file
Stationarity and Chaos in Infinitely Repeated Games of Incomplete Information
Consider an incomplete information game in which the players first learn their own types, and then infinitely often play the same normal form game with the same opponents. After each play, the players observe their own payoff and the action of their opponents. The payoff for a strategy n-tuple in the infinitely repeated game is the discounted present value of the infinite stream of payoffs generated by the strategy. This paper studies Bayesian learning in such a setting. Kalai and Lehrer [1991] and Jordan [1991] have shown that Bayesian equilibria to such games exist and eventually look like Nash equilibria to the infinitely repeated full information game with the correct types. However, due to folk theorems for complete information games, this still leaves the class of equilibria for such games to be quite large.
In order to refine the set of equilibria, we impose a restriction on the equilibrium strategies of the players which requires stationarity with respect to the profile of current beliefs: if the same profile of beliefs is reached at two different points in time, the players must choose the same behavioral strategy at both points in time. This set, called the belief stationary equilibria, is a subset of the Bayesian Nash equilibria. We compute a belief stationary equilibrium in an example. The equilibria that result can have elements of chaotic behavior. The equilibrium path of beliefs when types are not revealed can be chaotic, and small changes in initial beliefs can result in large changes in equilibrium actions
Learning an Unknown Network State in Routing Games
We study learning dynamics induced by myopic travelers who repeatedly play a
routing game on a transportation network with an unknown state. The state
impacts cost functions of one or more edges of the network. In each stage,
travelers choose their routes according to Wardrop equilibrium based on public
belief of the state. This belief is broadcast by an information system that
observes the edge loads and realized costs on the used edges, and performs a
Bayesian update to the prior stage's belief. We show that the sequence of
public beliefs and edge load vectors generated by the repeated play converge
almost surely. In any rest point, travelers have no incentive to deviate from
the chosen routes and accurately learn the true costs on the used edges.
However, the costs on edges that are not used may not be accurately learned.
Thus, learning can be incomplete in that the edge load vectors at rest point
and complete information equilibrium can be different. We present some
conditions for complete learning and illustrate situations when such an outcome
is not guaranteed
Correlated equilibria and communication in games.
Analyse bayésienne; Théorie des jeux; Information privée;
Reputation and perfection in repeated common interest games
We consider a wide class of repeated common interest games perturbed with one-sided incomplete information: one player (the informed player) might be a commitment type playing the Pareto dominant action. As discounting, which is assumed to be symmetric, and the prior probability of the commitment type go to zero, it is shown that the informed player can be held close to her minmax payoff even when perfection is imposed on the equilibrium
- …