5,113 research outputs found
A model for the evolution of reinforcement learning in fluctuating games
Many species are able to learn to associate behaviours with rewards as this gives fitness advantages in changing environments. Social interactions between population members may, however, require more cognitive abilities than simple trial-and-error learning, in particular the capacity to make accurate hypotheses about the material payoff consequences of alternative action combinations. It is unclear in this context whether natural selection necessarily favours individuals to use information about payoffs associated with nontried actions (hypothetical payoffs), as opposed to simple reinforcement of realized payoff. Here, we develop an evolutionary model in which individuals are genetically determined to use either trial-and-error learning or learning based on hypothetical reinforcements, and ask what is the evolutionarily stable learning rule under pairwise symmetric two-action stochastic repeated games played over the individual's lifetime. We analyse through stochastic approximation theory and simulations the learning dynamics on the behavioural timescale, and derive conditions where trial-and-error learning outcompetes hypothetical reinforcement learning on the evolutionary timescale. This occurs in particular under repeated cooperative interactions with the same partner. By contrast, we find that hypothetical reinforcement learners tend to be favoured under random interactions, but stable polymorphisms can also obtain where trial-and-error learners are maintained at a low frequency. We conclude that specific game structures can select for trial-and-error learning even in the absence of costs of cognition, which illustrates that cost-free increased cognition can be counterselected under social interactions
Neural network agent playing spin Hamiltonian games on a quantum computer
Quantum computing is expected to provide new promising approaches for solving
the most challenging problems in material science, communication, search,
machine learning and other domains. However, due to the decoherence and gate
imperfection errors modern quantum computer systems are characterized by a very
complex, dynamical, uncertain and fluctuating computational environment. We
develop an autonomous agent effectively interacting with such an environment to
solve magnetism problems. By using the reinforcement learning the agent is
trained to find the best-possible approximation of a spin Hamiltonian ground
state from self-play on quantum devices. We show that the agent can learn the
entanglement to imitate the ground state of the quantum spin dimer. The
experiments were conducted on quantum computers provided by IBM. To compensate
the decoherence we use local spin correction procedure derived from a general
sum rule for spin-spin correlation functions of a quantum system with even
number of antiferromagnetically-coupled spins in the ground state. Our study
paves a way to create a new family of the neural network eigensolvers for
quantum computers.Comment: Local spin correction procedure was used to compensate real device
errors; comparison with variational approach was adde
The minority game: An economics perspective
This paper gives a critical account of the minority game literature. The
minority game is a simple congestion game: players need to choose between two
options, and those who have selected the option chosen by the minority win. The
learning model proposed in this literature seems to differ markedly from the
learning models commonly used in economics. We relate the learning model from
the minority game literature to standard game-theoretic learning models, and
show that in fact it shares many features with these models. However, the
predictions of the learning model differ considerably from the predictions of
most other learning models. We discuss the main predictions of the learning
model proposed in the minority game literature, and compare these to
experimental findings on congestion games.Comment: 30 pages, 4 figure
Global adaptation in networks of selfish components: emergent associative memory at the system scale
In some circumstances complex adaptive systems composed of numerous self-interested agents can self-organise into structures that enhance global adaptation, efficiency or function. However, the general conditions for such an outcome are poorly understood and present a fundamental open question for domains as varied as ecology, sociology, economics, organismic biology and technological infrastructure design. In contrast, sufficient conditions for artificial neural networks to form structures that perform collective computational processes such as associative memory/recall, classification, generalisation and optimisation, are well-understood. Such global functions within a single agent or organism are not wholly surprising since the mechanisms (e.g. Hebbian learning) that create these neural organisations may be selected for this purpose, but agents in a multi-agent system have no obvious reason to adhere to such a structuring protocol or produce such global behaviours when acting from individual self-interest. However, Hebbian learning is actually a very simple and fully-distributed habituation or positive feedback principle. Here we show that when self-interested agents can modify how they are affected by other agents (e.g. when they can influence which other agents they interact with) then, in adapting these inter-agent relationships to maximise their own utility, they will necessarily alter them in a manner homologous with Hebbian learning. Multi-agent systems with adaptable relationships will thereby exhibit the same system-level behaviours as neural networks under Hebbian learning. For example, improved global efficiency in multi-agent systems can be explained by the inherent ability of associative memory to generalise by idealising stored patterns and/or creating new combinations of sub-patterns. Thus distributed multi-agent systems can spontaneously exhibit adaptive global behaviours in the same sense, and by the same mechanism, as the organisational principles familiar in connectionist models of organismic learning
Dynamical selection of Nash equilibria using Experience Weighted Attraction Learning: emergence of heterogeneous mixed equilibria
We study the distribution of strategies in a large game that models how
agents choose among different double auction markets. We classify the possible
mean field Nash equilibria, which include potentially segregated states where
an agent population can split into subpopulations adopting different
strategies. As the game is aggregative, the actual equilibrium strategy
distributions remain undetermined, however. We therefore compare with the
results of Experience-Weighted Attraction (EWA) learning, which at long times
leads to Nash equilibria in the appropriate limits of large intensity of
choice, low noise (long agent memory) and perfect imputation of missing scores
(fictitious play). The learning dynamics breaks the indeterminacy of the Nash
equilibria. Non-trivially, depending on how the relevant limits are taken, more
than one type of equilibrium can be selected. These include the standard
homogeneous mixed and heterogeneous pure states, but also \emph{heterogeneous
mixed} states where different agents play different strategies that are not all
pure. The analysis of the EWA learning involves Fokker-Planck modeling combined
with large deviation methods. The theoretical results are confirmed by
multi-agent simulations.Comment: 35 pages, 16 figure
- …