77,255 research outputs found

    Learning to Play Games in Extensive Form by Valuation

    Full text link
    A valuation for a player in a game in extensive form is an assignment of numeric values to the players moves. The valuation reflects the desirability moves. We assume a myopic player, who chooses a move with the highest valuation. Valuations can also be revised, and hopefully improved, after each play of the game. Here, a very simple valuation revision is considered, in which the moves made in a play are assigned the payoff obtained in the play. We show that by adopting such a learning process a player who has a winning strategy in a win-lose game can almost surely guarantee a win in a repeated game. When a player has more than two payoffs, a more elaborate learning procedure is required. We consider one that associates with each move the average payoff in the rounds in which this move was made. When all players adopt this learning procedure, with some perturbations, then, with probability 1, strategies that are close to subgame perfect equilibrium are played after some time. A single player who adopts this procedure can guarantee only her individually rational payoff

    Cognition and framing in sequential bargaining for gains and losses

    Get PDF
    Noncooperative game-theoretic models of sequential bargaining give an underpinning to cooperative solution concepts derived from axioms, and have proved useful in applications (see Osborne and Rubinstein 1990). But experimental studies of sequential bargaining with discounting have generally found systematic deviations between the offers people make and perfect equilibrium offers derived from backward induction (e.g., Ochs and Roth 1989). We have extended this experimental literature in two ways. First, we used a novel software system to record the information subjects looked at while they bargained. Measuring patterns of information search helped us draw inferences about how people think, testing as directly as possible whether people use backward induction to compute offers. Second, we compared bargaining over gains that shrink over time (because of discounting) to equivalent bargaining over losses that expand over time. In the games we studied, two players bargain by making a finite number of alternating offers. A unique subgame-perfect equilibrium can be computed by backward induction. The induction begins in the last period and works forward. Our experiments use a three-round game with a pie of 5.00anda50percentdiscountfactor(sothepieshrinksto5.00 and a 50-percent discount factor (so the pie shrinks to 2.50 and 1.25inthesecondandthirdrounds).Intheperfectequilibriumthefirstplayeroffersthesecondplayer1.25 in the second and third rounds). In the perfect equilibrium the first player offers the second player 1.25 and keeps $3.75

    Learning backward induction: a neural network agent approach

    Get PDF
    This paper addresses the question of whether neural networks (NNs), a realistic cognitive model of human information processing, can learn to backward induce in a two-stage game with a unique subgame-perfect Nash equilibrium. The NNs were found to predict the Nash equilibrium approximately 70% of the time in new games. Similarly to humans, the neural network agents are also found to suffer from subgame and truncation inconsistency, supporting the contention that they are appropriate models of general learning in humans. The agents were found to behave in a bounded rational manner as a result of the endogenous emergence of decision heuristics. In particular a very simple heuristic socialmax, that chooses the cell with the highest social payoff explains their behavior approximately 60% of the time, whereas the ownmax heuristic that simply chooses the cell with the maximum payoff for that agent fares worse explaining behavior roughly 38%, albeit still significantly better than chance. These two heuristics were found to be ecologically valid for the backward induction problem as they predicted the Nash equilibrium in 67% and 50% of the games respectively. Compared to various standard classification algorithms, the NNs were found to be only slightly more accurate than standard discriminant analyses. However, the latter do not model the dynamic learning process and have an ad hoc postulated functional form. In contrast, a NN agent’s behavior evolves with experience and is capable of taking on any functional form according to the universal approximation theorem.

    Towards Minimax Online Learning with Unknown Time Horizon

    Full text link
    We consider online learning when the time horizon is unknown. We apply a minimax analysis, beginning with the fixed horizon case, and then moving on to two unknown-horizon settings, one that assumes the horizon is chosen randomly according to some known distribution, and the other which allows the adversary full control over the horizon. For the random horizon setting with restricted losses, we derive a fully optimal minimax algorithm. And for the adversarial horizon setting, we prove a nontrivial lower bound which shows that the adversary obtains strictly more power than when the horizon is fixed and known. Based on the minimax solution of the random horizon setting, we then propose a new adaptive algorithm which "pretends" that the horizon is drawn from a distribution from a special family, but no matter how the actual horizon is chosen, the worst-case regret is of the optimal rate. Furthermore, our algorithm can be combined and applied in many ways, for instance, to online convex optimization, follow the perturbed leader, exponential weights algorithm and first order bounds. Experiments show that our algorithm outperforms many other existing algorithms in an online linear optimization setting

    Towards a Descriptive Model of Agent Strategy Search

    Get PDF
    It is argued that due to the complexity of most economic phenomena, the chances of deriving correct models from a priori principles are small. Instead are more descriptive approach to modelling should be pursued. Agent-based modelling is characterised as a step in this direction. However many agent-based models use off-the-shelf algorithms from computer science without regard to their descriptive accuracy. This paper attempts an agent model that describes the behaviour of subjects reported by Joep Sonnemans as accurately as possible. It takes a structure that is compatible with current thinking cognitive science and explores the nature of the agent processes that then match the behaviour of the subjects. This suggests further modelling improvements and experiments

    Economic man - or straw man?

    Get PDF
    corecore