67 research outputs found
Stationary Anonymous Sequential Games with Undiscounted Rewards
International audienceStationary anonymous sequential games with undiscounted rewards are a special class of games that combines features from both population games (in nitely many players) with stochastic games.We extend the theory for these games to the cases of total expected reward as well as to the expected average reward. We show that equilibria in the anonymous sequential game correspond to the limits of equilibria of related nite population games as the number of players grows to in nity. We provide examples to illustrate our results
Applications of Stationary Anonymous Sequential Games to Multiple Access Control in Wireless Communications
International audienceWe consider in this paper dynamic Multiple Access (MAC) games between a random number of players competing over collision channels. Each of several mobiles involved in an interaction determines whether to transmit at a high or at a low power. High power decreases the lifetime of the battery but results in smaller collision probability. We formulate this game as an anonymous sequential game with undiscounted reward which we recently introduced and which combines features from both population games (infinitely many players) and stochastic games. We briefly present this class of games and basic equilibrium existence results for the total expected reward as well as for the expected average reward. We then apply the theory in the MAC game
Reinforcement Learning: A Survey
This paper surveys the field of reinforcement learning from a
computer-science perspective. It is written to be accessible to researchers
familiar with machine learning. Both the historical basis of the field and a
broad selection of current work are summarized. Reinforcement learning is the
problem faced by an agent that learns behavior through trial-and-error
interactions with a dynamic environment. The work described here has a
resemblance to work in psychology, but differs considerably in the details and
in the use of the word ``reinforcement.'' The paper discusses central issues of
reinforcement learning, including trading off exploration and exploitation,
establishing the foundations of the field via Markov decision theory, learning
from delayed reinforcement, constructing empirical models to accelerate
learning, making use of generalization and hierarchy, and coping with hidden
state. It concludes with a survey of some implemented systems and an assessment
of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file
Efficient methods for near-optimal sequential decision making under uncertainty
This chapter discusses decision making under uncertainty. More specifically, it offers an overview of efficient Bayesian and distribution-free algorithms for making near-optimal sequential decisions under uncertainty about the environment. Due to the uncertainty, such algorithms must not only learn from their interaction with the environment but also perform as well as possible while learning is taking place. © 2010 Springer-Verlag Berlin Heidelberg
Private Information in Sequential Common-Value Auctions
We study an infinitely-repeated ?rst-price auction with common values. Initially, bid- ders receive independent private signals about the objects' value, which itself does not change over time. Learning occurs only through observation of the bids. Under one-sided incomplete information, this information is eventually revealed and the seller extracts es- sentially the entire rent (for large discount factors). Both players?payo¤s tend to zero as the discount factor tends to one. However, the uninformed bidder does relatively better than the informed bidder. We discuss the case of two-sided incomplete information, and argue that, under a Markovian re?nement, the outcome is pooling: information is revealed only insofar as it does not affect prices. Bidders submit a common, low bid in the tradition of collusion without conspiracy.repeated game with incomplete information; private information; ratchet effect; first-price auction; dynamic auctions
Discrete-time controlled markov processes with average cost criterion: a survey
This work is a survey of the average cost control problem for discrete-time Markov processes. The authors have attempted to put together a comprehensive account of the considerable research on this problem over the past three decades. The exposition ranges from finite to Borel state and action spaces and includes a variety of methodologies to find and characterize optimal policies. The authors have included a brief historical perspective of the research efforts in this area and have compiled a substantial yet not exhaustive bibliography. The authors have also identified several important questions that are still open to investigation
A Minimum Relative Entropy Principle for Learning and Acting
This paper proposes a method to construct an adaptive agent that is universal
with respect to a given class of experts, where each expert is an agent that
has been designed specifically for a particular environment. This adaptive
control problem is formalized as the problem of minimizing the relative entropy
of the adaptive agent from the expert that is most suitable for the unknown
environment. If the agent is a passive observer, then the optimal solution is
the well-known Bayesian predictor. However, if the agent is active, then its
past actions need to be treated as causal interventions on the I/O stream
rather than normal probability conditions. Here it is shown that the solution
to this new variational problem is given by a stochastic controller called the
Bayesian control rule, which implements adaptive behavior as a mixture of
experts. Furthermore, it is shown that under mild assumptions, the Bayesian
control rule converges to the control law of the most suitable expert.Comment: 36 pages, 11 figure
- …