Search CORE

6 research outputs found

Understanding responses to environments for the Prisoner's Dilemma: A meta analysis, multidimensional optimisation and machine learning approach

Author: Glynatsi Nikoleta
Publication venue
Publication date
Field of study

This thesis investigates the behaviour that Iterated Prisoner’s Dilemma strategies should adopt as a response to diﬀerent environments. The Iterated Prisoner’s Dilemma (IPD) is a particular topic of game theory that has attracted academic attention due to its applications in the understanding of the balance between cooperation and com petition in social and biological settings. This thesis uses a variety of mathematical and computational ﬁelds such as linear al gebra, research software engineering, data mining, network theory, natural language processing, data analysis, mathematical optimisation, resultant theory, markov mod elling, agent based simulation, heuristics and machine learning. The literature around the IPD has been exploring the performance of strategies in the game for years. The results of this thesis contribute to the discussion of successful performances using various novel approaches. Initially, this thesis evaluates the performance of 195 strategies in 45,600 computer tournaments. A large portion of the 195 strategies are drawn from the known and named strategies in the IPD literature, including many previous tournament winners. The 45,600 computer tournaments include tournament variations such as tournaments with noise, probabilistic match length, and both noise and probabilistic match length. This diversity of strategies and tournament types has resulted in the largest and most diverse collection of computer tournaments in the ﬁeld. The impact of features on the performance of the 195 strategies is evaluated using modern machine learning and statistical techniques. The results reinforce the idea that there are properties associated with success, these are: be nice, be provocable and generous, be a little envious, be clever, and adapt to the environment. Secondly, this thesis explores well performed behaviour focused on a speciﬁc set of IPD strategies called memory-one, and speciﬁcally a subset of them that are considered extortionate. These strategies have gained much attention in the research ﬁeld and have been acclaimed for their performance against single opponents. This thesis uses mathematical modelling to explore the best responses to a collection of memory-one strategies as a multidimensional non-linear optimisation problem, and the beneﬁts of extortionate/manipulative behaviour. The results contribute to the discussion that behaving in an extortionate way is not the optimal play in the IPD, and provide evidence that memory-one strategies suﬀer from their limited memory in multi agent interactions and can be out performed by longer memory strategies. Following this, the thesis investigates best response strategies in the form of static sequences of moves. It introduces an evolutionary algorithm which can successfully identify best response sequences, and uses a list of 192 opponents to generate a large data set of best response sequences. This data set is then used to train a type of recurrent neural network called the long short-term memory network, which have not gained much attention in the literature. A number of long short-term memory networks are trained to predict the actions of the best response sequences. The trained networks are used to introduce a total of 24 new IPD strategies which were shown to successfully win standard tournaments. From this research the following conclusions are made: there is not a single best strategy in the IPD for varying environments, however, there are properties associated with the strategies’ success distinct to diﬀerent environments. These properties reinforce and contradict well established results. They include being nice, opening with cooperation, being a little envious, being complex, adapting to the environment and using longer memory when possible

Online Research @ Cardiff

"Shit Happens":The Spontaneous Self-Organisation of Communal Boundary Latrines via Stigmergy in a Null Model of the European Badger, Meles meles

Author: Bullock Seth
Publication venue: Massachusetts Institute of Technology (MIT) Press
Publication date: 01/01/2016
Field of study

Crossref

Explore Bristol Research

Understanding Language Evolution in Overlapping Generations of Reinforcement Learning Agents

Author: Brace Lewys
Bullock Seth
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2016
Field of study

Crossref

Explore Bristol Research

Alife as a Model Discipline for Policy-Relevant Simulation Modelling:Might "Worse" Simulations Fuel a Better Science-Policy Interface? (Extended Abstract)

Author: Bullock Seth
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2016
Field of study

Crossref

Explore Bristol Research

Task Allocation in Foraging Robot Swarms:The Role of Information Sharing

Author: Bullock Seth
Crowder Richard
Pitonakova Lenka
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2016
Field of study

Autonomous task allocation is a desirable feature of robot swarms that collect and deliver items in scenarios where congestion, caused by accumulated items or robots, can temporarily interfere with swarm behaviour. In such settings, self-regulation of workforce can prevent unnecessary energy consumption. We explore two types of self-regulation: non-social, where robots become idle upon experiencing congestion, and social, where robots broadcast information about congestion to their team mates in order to socially inhibit foraging. We show that while both types of self-regulation can lead to improved energy efficiency and increase the amount of resource collected, the speed with which information about congestion flows through a swarm affects the scalability of these algorithms

Southampton (e-Prints Soton)

Crossref

Explore Bristol Research