49 research outputs found

    Using a theory of mind to find best responses to memory-one strategies

    Get PDF
    Memory-one strategies are a set of Iterated Prisoner's Dilemma strategies that have been praised for their mathematical tractability and performance against single opponents. This manuscript investigates best response memory-one strategies with a theory of mind for their opponents. The results add to the literature that has shown that extortionate play is not always optimal by showing that optimal play is often not extortionate. They also provide evidence that memory-one strategies suffer from their limited memory in multi agent interactions and can be out performed by optimised strategies with longer memory. We have developed a theory that has allowed to explore the entire space of memory-one strategies. The framework presented is suitable to study memory-one strategies in the Prisoner's Dilemma, but also in evolutionary processes such as the Moran process, Furthermore, results on the stability of defection in populations of memory-one strategies are also obtained

    An Evolutionary Game Theoretic Model of Rhino Horn Devaluation

    Get PDF
    Rhino populations are at a critical level due to the demand for rhino horn and the subsequent poaching. Wildlife managers attempt to secure rhinos with approaches to devalue the horn, the most common of which is dehorning. Game theory has been used to examine the interaction of poachers and wildlife managers where a manager can either `dehorn' their rhinos or leave the horn attached and poachers may behave `selectively' or `indiscriminately'. The approach described in this paper builds on this previous work and investigates the interactions between the poachers. We build an evolutionary game theoretic model and determine which strategy is preferred by a poacher in various different populations of poachers. The purpose of this work is to discover whether conditions which encourage the poachers to behave selectively exist, that is, they only kill those rhinos with full horns. The analytical results show that full devaluation of all rhinos will likely lead to indiscriminate poaching. In turn it shows that devaluing of rhinos can only be effective when implemented along with a strong disincentive framework. This paper aims to contribute to the necessary research required for informed discussion about the lively debate on legalising rhino horn trade

    Understanding responses to environments for the Prisoner's Dilemma: A meta analysis, multidimensional optimisation and machine learning approach

    Get PDF
    This thesis investigates the behaviour that Iterated Prisonerā€™s Dilemma strategies should adopt as a response to diļ¬€erent environments. The Iterated Prisonerā€™s Dilemma (IPD) is a particular topic of game theory that has attracted academic attention due to its applications in the understanding of the balance between cooperation and com petition in social and biological settings. This thesis uses a variety of mathematical and computational ļ¬elds such as linear al gebra, research software engineering, data mining, network theory, natural language processing, data analysis, mathematical optimisation, resultant theory, markov mod elling, agent based simulation, heuristics and machine learning. The literature around the IPD has been exploring the performance of strategies in the game for years. The results of this thesis contribute to the discussion of successful performances using various novel approaches. Initially, this thesis evaluates the performance of 195 strategies in 45,600 computer tournaments. A large portion of the 195 strategies are drawn from the known and named strategies in the IPD literature, including many previous tournament winners. The 45,600 computer tournaments include tournament variations such as tournaments with noise, probabilistic match length, and both noise and probabilistic match length. This diversity of strategies and tournament types has resulted in the largest and most diverse collection of computer tournaments in the ļ¬eld. The impact of features on the performance of the 195 strategies is evaluated using modern machine learning and statistical techniques. The results reinforce the idea that there are properties associated with success, these are: be nice, be provocable and generous, be a little envious, be clever, and adapt to the environment. Secondly, this thesis explores well performed behaviour focused on a speciļ¬c set of IPD strategies called memory-one, and speciļ¬cally a subset of them that are considered extortionate. These strategies have gained much attention in the research ļ¬eld and have been acclaimed for their performance against single opponents. This thesis uses mathematical modelling to explore the best responses to a collection of memory-one strategies as a multidimensional non-linear optimisation problem, and the beneļ¬ts of extortionate/manipulative behaviour. The results contribute to the discussion that behaving in an extortionate way is not the optimal play in the IPD, and provide evidence that memory-one strategies suļ¬€er from their limited memory in multi agent interactions and can be out performed by longer memory strategies. Following this, the thesis investigates best response strategies in the form of static sequences of moves. It introduces an evolutionary algorithm which can successfully identify best response sequences, and uses a list of 192 opponents to generate a large data set of best response sequences. This data set is then used to train a type of recurrent neural network called the long short-term memory network, which have not gained much attention in the literature. A number of long short-term memory networks are trained to predict the actions of the best response sequences. The trained networks are used to introduce a total of 24 new IPD strategies which were shown to successfully win standard tournaments. From this research the following conclusions are made: there is not a single best strategy in the IPD for varying environments, however, there are properties associated with the strategiesā€™ success distinct to diļ¬€erent environments. These properties reinforce and contradict well established results. They include being nice, opening with cooperation, being a little envious, being complex, adapting to the environment and using longer memory when possible

    Evolution Reinforces Cooperation with the Emergence of Self-Recognition Mechanisms: an empirical study of the Moran process for the iterated Prisoner's dilemma

    Full text link
    We present insights and empirical results from an extensive numerical study of the evolutionary dynamics of the iterated prisoner's dilemma. Fixation probabilities for Moran processes are obtained for all pairs of 164 different strategies including classics such as TitForTat, zero determinant strategies, and many more sophisticated strategies. Players with long memories and sophisticated behaviours outperform many strategies that perform well in a two player setting. Moreover we introduce several strategies trained with evolutionary algorithms to excel at the Moran process. These strategies are excellent invaders and resistors of invasion and in some cases naturally evolve handshaking mechanisms to resist invasion. The best invaders were those trained to maximize total payoff while the best resistors invoke handshake mechanisms. This suggests that while maximizing individual payoff can lead to the evolution of cooperation through invasion, the relatively weak invasion resistance of payoff maximizing strategies are not as evolutionarily stable as strategies employing handshake mechanisms

    Using a theory of mind to find best responses to memory-one strategies

    Get PDF
    Memory-one strategies are a set of Iterated Prisonerā€™s Dilemma strategies that have been praised for their mathematical tractability and performance against single opponents. This manuscript investigates best response memory-one strategies with a theory of mind for their opponents. The results add to the literature that has shown that extortionate play is not always optimal by showing that optimal play is often not extortionate. They also provide evidence that memory-one strategies suffer from their limited memory in multi agent interactions and can be out performed by optimised strategies with longer memory. We have developed a theory that has allowed to explore the entire space of memory-one strategies. The framework presented is suitable to study memory-one strategies in the Prisonerā€™s Dilemma, but also in evolutionary processes such as the Moran process. Furthermore, results on the stability of defection in populations of memory-one strategies are also obtained

    Reinforcement Learning Produces Dominant Strategies for the Iterated Prisoner's Dilemma

    Get PDF
    We present tournament results and several powerful strategies for the Iterated Prisoner's Dilemma created using reinforcement learning techniques (evolutionary and particle swarm algorithms). These strategies are trained to perform well against a corpus of over 170 distinct opponents, including many well-known and classic strategies. All the trained strategies win standard tournaments against the total collection of other opponents. The trained strategies and one particular human made designed strategy are the top performers in noisy tournaments also

    A bibliometric study of research topics, collaboration, and centrality in the iterated prisoner's dilemma

    Get PDF
    This manuscript explores the research topics and collaborative behaviour of authors in the field of the Prisonerā€™s Dilemma using topic modeling and a graph theoretic analysis of the co-authorship network. The analysis identified five research topics in the Prisonerā€™s Dilemma which have been relevant over the course of time. These are human subject research, biological studies, strategies, evolutionary dynamics on networks and modeling problems as a Prisonerā€™s Dilemma game. Moreover, the results demonstrated the Prisonerā€™s Dilemma is a field of continued interest, and that it is a collaborative field compared to other game theoretic fields. The co-authorship network suggests that authors are focused on their communities and that not many connections across the communities are made. The most central authors of the network are the authors connected to the main cluster. Through examining the networks of topics, it was uncovered that the main cluster is characterised by the collaboration of authors in a single topic. These findings add to the bibliometrics study in another field and present new questions and avenues of research to understand the reasons for the measured behaviours
    corecore