77,255 research outputs found
Learning to Play Games in Extensive Form by Valuation
A valuation for a player in a game in extensive form is an assignment of
numeric values to the players moves. The valuation reflects the desirability
moves. We assume a myopic player, who chooses a move with the highest
valuation. Valuations can also be revised, and hopefully improved, after each
play of the game. Here, a very simple valuation revision is considered, in
which the moves made in a play are assigned the payoff obtained in the play. We
show that by adopting such a learning process a player who has a winning
strategy in a win-lose game can almost surely guarantee a win in a repeated
game. When a player has more than two payoffs, a more elaborate learning
procedure is required. We consider one that associates with each move the
average payoff in the rounds in which this move was made. When all players
adopt this learning procedure, with some perturbations, then, with probability
1, strategies that are close to subgame perfect equilibrium are played after
some time. A single player who adopts this procedure can guarantee only her
individually rational payoff
Cognition and framing in sequential bargaining for gains and losses
Noncooperative game-theoretic models of sequential bargaining give an
underpinning to cooperative solution concepts derived from axioms, and
have proved useful in applications (see Osborne and Rubinstein 1990). But
experimental studies of sequential bargaining with discounting have generally
found systematic deviations between the offers people make and perfect
equilibrium offers derived from backward induction (e.g., Ochs and
Roth 1989).
We have extended this experimental literature in two ways. First,
we used a novel software system to record the information subjects
looked at while they bargained. Measuring patterns of information search
helped us draw inferences about how people think, testing as directly
as possible whether people use backward induction to compute offers.
Second, we compared bargaining over gains that shrink over time (because
of discounting) to equivalent bargaining over losses that expand over
time.
In the games we studied, two players bargain by making a finite number
of alternating offers. A unique subgame-perfect equilibrium can be computed
by backward induction. The induction begins in the last period and
works forward. Our experiments use a three-round game with a pie of
2.50 and
1.25 and keeps $3.75
Learning backward induction: a neural network agent approach
This paper addresses the question of whether neural networks (NNs), a realistic cognitive model of human information processing, can learn to backward induce in a two-stage game with a unique subgame-perfect Nash equilibrium. The NNs were found to predict the Nash equilibrium approximately 70% of the time in new games. Similarly to humans, the neural network agents are also found to suffer from subgame and truncation inconsistency, supporting the contention that they are appropriate models of general learning in humans. The agents were found to behave in a bounded rational manner as a result of the endogenous emergence of decision heuristics. In particular a very simple heuristic socialmax, that chooses the cell with the highest social payoff explains their behavior approximately 60% of the time, whereas the ownmax heuristic that simply chooses the cell with the maximum payoff for that agent fares worse explaining behavior roughly 38%, albeit still significantly better than chance. These two heuristics were found to be ecologically valid for the backward induction problem as they predicted the Nash equilibrium in 67% and 50% of the games respectively. Compared to various standard classification algorithms, the NNs were found to be only slightly more accurate than standard discriminant analyses. However, the latter do not model the dynamic learning process and have an ad hoc postulated functional form. In contrast, a NN agent’s behavior evolves with experience and is capable of taking on any functional form according to the universal approximation theorem.
Towards Minimax Online Learning with Unknown Time Horizon
We consider online learning when the time horizon is unknown. We apply a
minimax analysis, beginning with the fixed horizon case, and then moving on to
two unknown-horizon settings, one that assumes the horizon is chosen randomly
according to some known distribution, and the other which allows the adversary
full control over the horizon. For the random horizon setting with restricted
losses, we derive a fully optimal minimax algorithm. And for the adversarial
horizon setting, we prove a nontrivial lower bound which shows that the
adversary obtains strictly more power than when the horizon is fixed and known.
Based on the minimax solution of the random horizon setting, we then propose a
new adaptive algorithm which "pretends" that the horizon is drawn from a
distribution from a special family, but no matter how the actual horizon is
chosen, the worst-case regret is of the optimal rate. Furthermore, our
algorithm can be combined and applied in many ways, for instance, to online
convex optimization, follow the perturbed leader, exponential weights algorithm
and first order bounds. Experiments show that our algorithm outperforms many
other existing algorithms in an online linear optimization setting
Towards a Descriptive Model of Agent Strategy Search
It is argued that due to the complexity of most economic phenomena, the chances of deriving correct models from a priori principles are small. Instead are more descriptive approach to modelling should be pursued. Agent-based modelling is characterised as a step in this direction. However many agent-based models use off-the-shelf algorithms from computer science without regard to their descriptive accuracy. This paper attempts an agent model that describes the behaviour of subjects reported by Joep Sonnemans as accurately as possible. It takes a structure that is compatible with current thinking cognitive science and explores the nature of the agent processes that then match the behaviour of the subjects. This suggests further modelling improvements and experiments
- …