6 research outputs found
Understanding responses to environments for the Prisoner's Dilemma: A meta analysis, multidimensional optimisation and machine learning approach
This thesis investigates the behaviour that Iterated Prisonerās Dilemma strategies
should adopt as a response to diļ¬erent environments. The Iterated Prisonerās Dilemma
(IPD) is a particular topic of game theory that has attracted academic attention due
to its applications in the understanding of the balance between cooperation and com
petition in social and biological settings.
This thesis uses a variety of mathematical and computational ļ¬elds such as linear al
gebra, research software engineering, data mining, network theory, natural language
processing, data analysis, mathematical optimisation, resultant theory, markov mod
elling, agent based simulation, heuristics and machine learning.
The literature around the IPD has been exploring the performance of strategies in the
game for years. The results of this thesis contribute to the discussion of successful
performances using various novel approaches.
Initially, this thesis evaluates the performance of 195 strategies in 45,600 computer
tournaments. A large portion of the 195 strategies are drawn from the known and
named strategies in the IPD literature, including many previous tournament winners.
The 45,600 computer tournaments include tournament variations such as tournaments
with noise, probabilistic match length, and both noise and probabilistic match length.
This diversity of strategies and tournament types has resulted in the largest and most
diverse collection of computer tournaments in the ļ¬eld. The impact of features on
the performance of the 195 strategies is evaluated using modern machine learning and
statistical techniques. The results reinforce the idea that there are properties associated
with success, these are: be nice, be provocable and generous, be a little envious, be
clever, and adapt to the environment.
Secondly, this thesis explores well performed behaviour focused on a speciļ¬c set of IPD
strategies called memory-one, and speciļ¬cally a subset of them that are considered extortionate. These strategies have gained much attention in the research ļ¬eld and
have been acclaimed for their performance against single opponents. This thesis uses
mathematical modelling to explore the best responses to a collection of memory-one
strategies as a multidimensional non-linear optimisation problem, and the beneļ¬ts of
extortionate/manipulative behaviour. The results contribute to the discussion that
behaving in an extortionate way is not the optimal play in the IPD, and provide
evidence that memory-one strategies suļ¬er from their limited memory in multi agent
interactions and can be out performed by longer memory strategies.
Following this, the thesis investigates best response strategies in the form of static
sequences of moves. It introduces an evolutionary algorithm which can successfully
identify best response sequences, and uses a list of 192 opponents to generate a large
data set of best response sequences. This data set is then used to train a type of
recurrent neural network called the long short-term memory network, which have not
gained much attention in the literature. A number of long short-term memory networks
are trained to predict the actions of the best response sequences. The trained networks
are used to introduce a total of 24 new IPD strategies which were shown to successfully
win standard tournaments.
From this research the following conclusions are made: there is not a single best strategy
in the IPD for varying environments, however, there are properties associated with the
strategiesā success distinct to diļ¬erent environments. These properties reinforce and
contradict well established results. They include being nice, opening with cooperation,
being a little envious, being complex, adapting to the environment and using longer
memory when possible
Task Allocation in Foraging Robot Swarms:The Role of Information Sharing
Autonomous task allocation is a desirable feature of robot swarms that collect and deliver items in scenarios where congestion, caused by accumulated items or robots, can temporarily interfere with swarm behaviour. In such settings, self-regulation of workforce can prevent unnecessary energy consumption. We explore two types of self-regulation: non-social, where robots become idle upon experiencing congestion, and social, where robots broadcast information about congestion to their team mates in order to socially inhibit foraging. We show that while both types of self-regulation can lead to improved energy efficiency and increase the amount of resource collected, the speed with which information about congestion flows through a swarm affects the scalability of these algorithms