24 research outputs found

    Learning to Play Games in Extensive Form by Valuation

    Full text link
    A valuation for a player in a game in extensive form is an assignment of numeric values to the players moves. The valuation reflects the desirability moves. We assume a myopic player, who chooses a move with the highest valuation. Valuations can also be revised, and hopefully improved, after each play of the game. Here, a very simple valuation revision is considered, in which the moves made in a play are assigned the payoff obtained in the play. We show that by adopting such a learning process a player who has a winning strategy in a win-lose game can almost surely guarantee a win in a repeated game. When a player has more than two payoffs, a more elaborate learning procedure is required. We consider one that associates with each move the average payoff in the rounds in which this move was made. When all players adopt this learning procedure, with some perturbations, then, with probability 1, strategies that are close to subgame perfect equilibrium are played after some time. A single player who adopts this procedure can guarantee only her individually rational payoff

    Learning to play games in extensive form by valuation

    Get PDF
    A valuation for a board game is an assignment of numeric values to different states of the board. The valuation reflects the desirability of the states for the player. It can be used by a player to decide on her next move during the play. We assume a myopic player, who chooses a move with the highest valuation. Valuations can also be revised, and hopefully improved, after each play of the game. Here, a very simple valuation revision is considered, in which the states of the board visited in a play are assigned the payoff obtained in the play. We show that by adopting such a learning process a player who has a winning strategy in a win-lose game can almost surely guarantee a win in a repeated game. When a player has more than two payoffs, a more elaborate learning procedure is required. We consider one that associates with each state the average payoff in the rounds in which this node was reached. When all players adopt this learning procedure, with some perturbations, then, with probability 1, strategies that are close to subgame perfect equilibrium are played after some time. A single player who adopts this procedure can guarantee only her individually rational payoff.reinforcement learning

    Valuation Equilibria

    Get PDF
    We introduce a new solution concept for games in extensive form with perfect information: the valuation equilibrium. The moves of each player are partitioned into similarity classes. A valuation of the player is a real valued function on the set of her similarity classes. At each node a player chooses a move that belongs to a class with maximum valuation. The valuation of each player is \emph{consistent} with the strategy profile in the sense that the valuation of a similarity class is the player expected payoff given that the path (induced by the strategy profile) intersects the similarity class. The solution concept is applied to decision problems and multi-player extensive form games. It is contrasted with existing solution concepts. An aspiration-based approach is also proposed, in which the similarity partitions are determined endogenously. The corresponding equilibrium is called the aspiration-based valuation equilibrium (ASVE). While the Subgame Perfect Nash Equilibrium is always an ASVE, there are other ASVE in general. But, in zero-sum two-player games without chance moves every player must get her value in any ASVE.bounded rationality, valuation, similarity, aspiration.

    Valuation Equilibria

    Get PDF

    Valuation equilibrium

    Get PDF
    We introduce a new solution concept for games in extensive form with perfect information, valuation equilibrium, which is based on a partition of each player's moves into similarity classes. A valuation of a player'is a real-valued function on the set of her similarity classes. In this equilibrium each player's strategy is optimal in the sense that at each of her nodes, a player chooses a move that belongs to a class with maximum valuation. The valuation of each player is consistent with the strategy profile in the sense that the valuation of a similarity class is the player's expected payoff, given that the path (induced by the strategy profile) intersects the similarity class. The solution concept is applied to decision problems and multi-player extensive form games. It is contrasted with existing solution concepts. The valuation approach is next applied to stopping games, in which non-terminal moves form a single similarity class, and we note that the behaviors obtained echo some biases observed experimentally. Finally, we tentatively suggest a way of endogenizing the similarity partitions in which moves are categorized according to how well they perform relative to the expected equilibrium value, interpreted as the aspiration level

    Steady State Learning and the Code of Hammurabi

    Get PDF
    The code of Hammurabi specified a “trial by surviving in the river” as a way of deciding whether an accusation was true. This system is puzzling for two reasons. First, it is based on a superstition: We do not believe that the guilty are any more likely to drown than the innocent. Second, if people can be easily persuaded to hold a superstitious belief, why such an elaborate mechanism? Why not simply assert that those who are guilty will be struck dead by lightning? We attack these puzzles from the perspective of the theory of learning in games. We give a partial characterization of patiently stable outcomes that arise as the limit of steady states with rational learning as players become more patient. These “subgame-confirmed Nash equilibria” have self-confirming beliefs at certain information sets reachable by a single deviation. We analyze this refinement and use it as a tool to study the broader issue of the survival of superstition. According to this theory Hammurabi had it exactly right: his law uses the greatest amount of superstition consistent with patient rational learning.
    corecore