68 research outputs found

    Strategic Power Revisited

    Get PDF
    Traditional power indices ignore preferences and strategic interaction. Equilibrium analysis of particular non-cooperative decision procedures is unsuitable for normative analysis and assumes typically unavailable information. These points drive a lingering debate about the right approach to power analysis. A unified framework that works both sides of the street is developed here. It rests on a notion of a posteriori power which formalizes players' marginal impact to outcomes in cooperative and non-cooperative games, for strategic interaction and purely random behaviour. Taking expectations with respect to preferences, actions, and procedures then defines a meaningful a priori measure. Established indices turn out to be special cases.power indices, spatial voting, equilibrium analysis, decision procedures

    Reinforcement Learning from Self-Play in Imperfect-Information Games

    Get PDF
    This thesis investigates artificial agents learning to make strategic decisions in imperfect-information games. In particular, we introduce a novel approach to reinforcement learning from self-play. We introduce Smooth UCT, which combines the game-theoretic notion of fictitious play with Monte Carlo Tree Search (MCTS). Smooth UCT outperformed a classic MCTS method in several imperfect-information poker games and won three silver medals in the 2014 Annual Computer Poker Competition. We develop Extensive-Form Fictitious Play (XFP) that is entirely implemented in sequential strategies, thus extending this prominent game-theoretic model of learning to sequential games. XFP provides a principled foundation for self-play reinforcement learning in imperfect-information games. We introduce Fictitious Self-Play (FSP), a class of sample-based reinforcement learning algorithms that approximate XFP. We instantiate FSP with neuralnetwork function approximation and deep learning techniques, producing Neural FSP (NFSP). We demonstrate that (approximate) Nash equilibria and their representations (abstractions) can be learned using NFSP end to end, i.e. interfacing with the raw inputs and outputs of the domain. NFSP approached the performance of state-of-the-art, superhuman algorithms in Limit Texas Hold’em - an imperfect-information game at the absolute limit of tractability using massive computational resources. This is the first time that any reinforcement learning algorithm, learning solely from game outcomes without prior domain knowledge, achieved such a feat

    Wardrop Equilibrium Can Be Boundedly Rational: A New Behavioral Theory of Route Choice

    Full text link
    As one of the most fundamental concepts in transportation science, Wardrop equilibrium (WE) has always had a relatively weak behavioral underpinning. To strengthen this foundation, one must reckon with bounded rationality in human decision-making processes, such as the lack of accurate information, limited computing power, and sub-optimal choices. This retreat from behavioral perfectionism in the literature, however, was typically accompanied by a conceptual modification of WE. Here we show that giving up perfect rationality need not force a departure from WE. On the contrary, WE can be reached with global stability in a routing game played by boundedly rational travelers. We achieve this result by developing a day-to-day (DTD) dynamical model that mimics how travelers gradually adjust their route valuations, hence choice probabilities, based on past experiences. Our model, called cumulative logit (CULO), resembles the classical DTD models but makes a crucial change: whereas the classical models assume routes are valued based on the cost averaged over historical data, ours values the routes based on the cost accumulated. To describe route choice behaviors, the CULO model only uses two parameters, one accounting for the rate at which the future route cost is discounted in the valuation relative to the past ones and the other describing the sensitivity of route choice probabilities to valuation differences. We prove that the CULO model always converges to WE, regardless of the initial point, as long as the behavioral parameters satisfy certain mild conditions. Our theory thus upholds WE's role as a benchmark in transportation systems analysis. It also resolves the theoretical challenge posed by Harsanyi's instability problem by explaining why equally good routes at WE are selected with different probabilities
    corecore