11 research outputs found
On the Impossibility of Regret Minimization in Repeated Games
Regret minimizing strategies for repeated games have been receiving increasing attention in the literature. These are simple adaptive behavior rules that exhibit nice convergence properties. If all players follow regret minimizing strategies, their average joint play converges to the set of correlated equilibria or to the Hannan set (depending on the notion of regret in use), or even to Nash equilibrium on certain classes of games. In this note we raise the question of validity of the regret minimization objective. By example we show that regret minimization can lead to unrealistic behavior, since it fails to take into account the effect of one's actions on subsequent behavior of the opponents. An amended notion of regret that corrects this defect is not very useful either, since achieving a no-regret objective is not guaranteed in that case.Repeated games, Regret minimization, No-regret strategy
Regret Matching with Finite Memory
We consider the regret matching process with finite memory. For general games in normal form, it is shown that any recurrent class of the dynamics must be such that the action profiles that appear in it constitute a closed set under the “same or better reply” correspondence (CUSOBR set) that does not contain a smaller product set that is closed under “same or better replies,” i.e., a smaller PCUSOBR set. Two characterizations of the recurrent classes are offered. First, for the class of weakly acyclic games under better replies, each recurrent class is monomorphic and corresponds to each pure Nash equilibrium. Second, for a modified process with random sampling, if the sample size is sufficiently small with respect to the memory bound, the recurrent classes consist of action profiles that are minimal PCUSOBR sets. Our results are used in a robust example that shows that the limiting empirical distribution of play can be arbitrarily far from correlated equilibria for any large but finite choice of the memory bound.Regret Matching; Nash Equilibria; Closed Sets under Same or Better Replies; Correlated Equilibria.
Decision Making in Uncertain and Changing Environments
We consider an agent who has to repeatedly make choices in an uncertain and changing environment, who has full information of the past, who discounts future payoffs, but who has no prior. We provide a learning algorithm that performs almost as well as the best of a given finite number of experts or benchmark strategies and does so at any point in time, provided the agent is sufficiently patient. The key is to find the appropriate degree of forgetting distant past. Standard learning algorithms that treat recent and distant past equally do not have the sequential epsilon optimality property.Adaptive learning, experts, distribution-free, epsilon-optimality, Hannan regret
Decision making in uncertain and changing environments
We consider an agent who has to repeatedly make choices in an uncertain and changing environment, who has full information of the past, who discounts future payoffs, but who has no prior. We provide a learning algorithm that performs almost as well as the best of a given finite number of experts or benchmark strategies and does so at any point in time, provided the agent is sufficiently patient. The key is to find the appropriate degree of forgetting distant past. Standard learning algorithms that treat recent and distant past equally do not have the sequential epsilon optimality property.Adaptive learning, experts, distribution-free, e-optimality, Hannan regret
Regret matching with finite memory
We consider the regret matching process with finite memory. For general games in normal form, it is shown that any recurrent class of the dynamics must be such that the action profiles that appear in it constitute a closed set under the “same or better reply” correspondence (CUSOBR set) that does not contain a smaller product set that is closed under “same or better replies,” i.e., a smaller PCUSOBR set. Two characterizations of the recurrent classes are offered. First, for the class of weakly acyclic games under better replies, each recurrent class is monomorphic and corresponds to each pure Nash equilibrium. Second, for a modified process with random sampling, if the sample size is sufficiently small with respect to the memory bound, the recurrent classes consist of action profiles that are minimal PCUSOBR sets. Our results are used in a robust example that shows that the limiting empirical distribution of play can be arbitrarily far from correlated equilibria for any large but finite choice of the memory bound.regret matching; nash equilibria; closed sets under same or better replies; correlated equilibria
Recommended from our members
Approachability with Discounting
We establish a version of Blackwell’s (1956) approachability result with discounting. Our main result shows that, for convex sets, our notion of approachability with discounting is equivalent to Blackwell’s (1956) approachability. Our proofs are based on a concentration result for probability measures and on the minmax theorem for two-person, zero-sum games