24,548 research outputs found

    Approximating n-player behavioural strategy nash equilibria using coevolution

    Get PDF
    Coevolutionary algorithms are plagued with a set of problems related to intransitivity that make it questionable what the end product of a coevolutionary run can achieve. With the introduction of solution concepts into coevolution, part of the issue was alleviated, however efficiently representing and achieving game theoretic solution concepts is still not a trivial task. In this paper we propose a coevolutionary algorithm that approximates behavioural strategy Nash equilibria in n-player zero sum games, by exploiting the minimax solution concept. In order to support our case we provide a set of experiments in both games of known and unknown equilibria. In the case of known equilibria, we can confirm our algorithm converges to the known solution, while in the case of unknown equilibria we can see a steady progress towards Nash. Copyright 2011 ACM

    Markov Decision Processes with Applications in Wireless Sensor Networks: A Survey

    Full text link
    Wireless sensor networks (WSNs) consist of autonomous and resource-limited devices. The devices cooperate to monitor one or more physical phenomena within an area of interest. WSNs operate as stochastic systems because of randomness in the monitored environments. For long service time and low maintenance cost, WSNs require adaptive and robust methods to address data exchange, topology formulation, resource and power optimization, sensing coverage and object detection, and security challenges. In these problems, sensor nodes are to make optimized decisions from a set of accessible strategies to achieve design goals. This survey reviews numerous applications of the Markov decision process (MDP) framework, a powerful decision-making tool to develop adaptive algorithms and protocols for WSNs. Furthermore, various solution methods are discussed and compared to serve as a guide for using MDPs in WSNs

    Monte Carlo Planning method estimates planning horizons during interactive social exchange

    Get PDF
    Reciprocating interactions represent a central feature of all human exchanges. They have been the target of various recent experiments, with healthy participants and psychiatric populations engaging as dyads in multi-round exchanges such as a repeated trust task. Behaviour in such exchanges involves complexities related to each agent's preference for equity with their partner, beliefs about the partner's appetite for equity, beliefs about the partner's model of their partner, and so on. Agents may also plan different numbers of steps into the future. Providing a computationally precise account of the behaviour is an essential step towards understanding what underlies choices. A natural framework for this is that of an interactive partially observable Markov decision process (IPOMDP). However, the various complexities make IPOMDPs inordinately computationally challenging. Here, we show how to approximate the solution for the multi-round trust task using a variant of the Monte-Carlo tree search algorithm. We demonstrate that the algorithm is efficient and effective, and therefore can be used to invert observations of behavioural choices. We use generated behaviour to elucidate the richness and sophistication of interactive inference

    Monte Carlo Planning method estimates planning horizons during interactive social exchange

    Full text link
    Reciprocating interactions represent a central feature of all human exchanges. They have been the target of various recent experiments, with healthy participants and psychiatric populations engaging as dyads in multi-round exchanges such as a repeated trust task. Behaviour in such exchanges involves complexities related to each agent's preference for equity with their partner, beliefs about the partner's appetite for equity, beliefs about the partner's model of their partner, and so on. Agents may also plan different numbers of steps into the future. Providing a computationally precise account of the behaviour is an essential step towards understanding what underlies choices. A natural framework for this is that of an interactive partially observable Markov decision process (IPOMDP). However, the various complexities make IPOMDPs inordinately computationally challenging. Here, we show how to approximate the solution for the multi-round trust task using a variant of the Monte-Carlo tree search algorithm. We demonstrate that the algorithm is efficient and effective, and therefore can be used to invert observations of behavioural choices. We use generated behaviour to elucidate the richness and sophistication of interactive inference

    Strategy Generation for Partially Observable Stochastic Games

    Get PDF
    Cílem této teze je navrhnout a naimplementovat modifikaci algoritmu PG-HSVI (který byl navrhnut k řešení částečně pozorovatelných stochastických her), která by mohla vylepšit jeho schopnost řešit větší problémy. K tomu jsme využili metody incrementálního generování strategií. Algoritmus ISG-PG-HSVI představený v této tezi začíná zjednodušením původní hry tak, že odstraní některé z dostupných akcí pro jednoho z hráčů. Poté vyřeší tuto zjednodušenou variantu hry a výsledek uloží. Následně přidá do hry zpět některé z chybějících akcí a vyřeší tuto novou hru s využitím uložených výsledků z přechozí iterace. Toto se opakuje, dokud není vyřešena celá původní hra. Dále jsme navrhli a naimplementovali heuristiku s cílem ukázat, zda záleží na tom, které akce jsou přidány do hry přednostně. Na konci předkládáme výsledky srovnání těchto algoritmů v experimentech.The aim of this thesis is to design and implement a modification of algorithm PG-HSVI (that was designed to solve partially observable stochastic games), that might improve its ability to solve larger problems. For this we applied an incremental strategy generation method. The ISG-PG-HSVI algorithm introduced in this thesis starts with simplifying the original game by removing some of the available actions for one of the players. Then it solves this simplified variant of the game and saves the result. After that it adds some of the actions back and solves this game, while reusing the results from the previous iteration. This repeats until the original game is solved. Furthermore we designed and implemented a heuristic with the aim to show, whether it matters, which actions are added to the game beforehand. In the end we present the results of comparing these algorithms in experiments
    • …
    corecore