5,113 research outputs found

    A model for the evolution of reinforcement learning in fluctuating games

    Get PDF
    Many species are able to learn to associate behaviours with rewards as this gives fitness advantages in changing environments. Social interactions between population members may, however, require more cognitive abilities than simple trial-and-error learning, in particular the capacity to make accurate hypotheses about the material payoff consequences of alternative action combinations. It is unclear in this context whether natural selection necessarily favours individuals to use information about payoffs associated with nontried actions (hypothetical payoffs), as opposed to simple reinforcement of realized payoff. Here, we develop an evolutionary model in which individuals are genetically determined to use either trial-and-error learning or learning based on hypothetical reinforcements, and ask what is the evolutionarily stable learning rule under pairwise symmetric two-action stochastic repeated games played over the individual's lifetime. We analyse through stochastic approximation theory and simulations the learning dynamics on the behavioural timescale, and derive conditions where trial-and-error learning outcompetes hypothetical reinforcement learning on the evolutionary timescale. This occurs in particular under repeated cooperative interactions with the same partner. By contrast, we find that hypothetical reinforcement learners tend to be favoured under random interactions, but stable polymorphisms can also obtain where trial-and-error learners are maintained at a low frequency. We conclude that specific game structures can select for trial-and-error learning even in the absence of costs of cognition, which illustrates that cost-free increased cognition can be counterselected under social interactions

    Neural network agent playing spin Hamiltonian games on a quantum computer

    Full text link
    Quantum computing is expected to provide new promising approaches for solving the most challenging problems in material science, communication, search, machine learning and other domains. However, due to the decoherence and gate imperfection errors modern quantum computer systems are characterized by a very complex, dynamical, uncertain and fluctuating computational environment. We develop an autonomous agent effectively interacting with such an environment to solve magnetism problems. By using the reinforcement learning the agent is trained to find the best-possible approximation of a spin Hamiltonian ground state from self-play on quantum devices. We show that the agent can learn the entanglement to imitate the ground state of the quantum spin dimer. The experiments were conducted on quantum computers provided by IBM. To compensate the decoherence we use local spin correction procedure derived from a general sum rule for spin-spin correlation functions of a quantum system with even number of antiferromagnetically-coupled spins in the ground state. Our study paves a way to create a new family of the neural network eigensolvers for quantum computers.Comment: Local spin correction procedure was used to compensate real device errors; comparison with variational approach was adde

    The minority game: An economics perspective

    Get PDF
    This paper gives a critical account of the minority game literature. The minority game is a simple congestion game: players need to choose between two options, and those who have selected the option chosen by the minority win. The learning model proposed in this literature seems to differ markedly from the learning models commonly used in economics. We relate the learning model from the minority game literature to standard game-theoretic learning models, and show that in fact it shares many features with these models. However, the predictions of the learning model differ considerably from the predictions of most other learning models. We discuss the main predictions of the learning model proposed in the minority game literature, and compare these to experimental findings on congestion games.Comment: 30 pages, 4 figure

    Global adaptation in networks of selfish components: emergent associative memory at the system scale

    No full text
    In some circumstances complex adaptive systems composed of numerous self-interested agents can self-organise into structures that enhance global adaptation, efficiency or function. However, the general conditions for such an outcome are poorly understood and present a fundamental open question for domains as varied as ecology, sociology, economics, organismic biology and technological infrastructure design. In contrast, sufficient conditions for artificial neural networks to form structures that perform collective computational processes such as associative memory/recall, classification, generalisation and optimisation, are well-understood. Such global functions within a single agent or organism are not wholly surprising since the mechanisms (e.g. Hebbian learning) that create these neural organisations may be selected for this purpose, but agents in a multi-agent system have no obvious reason to adhere to such a structuring protocol or produce such global behaviours when acting from individual self-interest. However, Hebbian learning is actually a very simple and fully-distributed habituation or positive feedback principle. Here we show that when self-interested agents can modify how they are affected by other agents (e.g. when they can influence which other agents they interact with) then, in adapting these inter-agent relationships to maximise their own utility, they will necessarily alter them in a manner homologous with Hebbian learning. Multi-agent systems with adaptable relationships will thereby exhibit the same system-level behaviours as neural networks under Hebbian learning. For example, improved global efficiency in multi-agent systems can be explained by the inherent ability of associative memory to generalise by idealising stored patterns and/or creating new combinations of sub-patterns. Thus distributed multi-agent systems can spontaneously exhibit adaptive global behaviours in the same sense, and by the same mechanism, as the organisational principles familiar in connectionist models of organismic learning

    Dynamical selection of Nash equilibria using Experience Weighted Attraction Learning: emergence of heterogeneous mixed equilibria

    Get PDF
    We study the distribution of strategies in a large game that models how agents choose among different double auction markets. We classify the possible mean field Nash equilibria, which include potentially segregated states where an agent population can split into subpopulations adopting different strategies. As the game is aggregative, the actual equilibrium strategy distributions remain undetermined, however. We therefore compare with the results of Experience-Weighted Attraction (EWA) learning, which at long times leads to Nash equilibria in the appropriate limits of large intensity of choice, low noise (long agent memory) and perfect imputation of missing scores (fictitious play). The learning dynamics breaks the indeterminacy of the Nash equilibria. Non-trivially, depending on how the relevant limits are taken, more than one type of equilibrium can be selected. These include the standard homogeneous mixed and heterogeneous pure states, but also \emph{heterogeneous mixed} states where different agents play different strategies that are not all pure. The analysis of the EWA learning involves Fokker-Planck modeling combined with large deviation methods. The theoretical results are confirmed by multi-agent simulations.Comment: 35 pages, 16 figure
    corecore