22,916 research outputs found

    Learning with Opponent-Learning Awareness

    Full text link
    Multi-agent settings are quickly gathering importance in machine learning. This includes a plethora of recent work on deep multi-agent reinforcement learning, but also can be extended to hierarchical RL, generative adversarial networks and decentralised optimisation. In all these settings the presence of multiple learning agents renders the training problem non-stationary and often leads to unstable training or undesired final results. We present Learning with Opponent-Learning Awareness (LOLA), a method in which each agent shapes the anticipated learning of the other agents in the environment. The LOLA learning rule includes a term that accounts for the impact of one agent's policy on the anticipated parameter update of the other agents. Results show that the encounter of two LOLA agents leads to the emergence of tit-for-tat and therefore cooperation in the iterated prisoners' dilemma, while independent learning does not. In this domain, LOLA also receives higher payouts compared to a naive learner, and is robust against exploitation by higher order gradient-based methods. Applied to repeated matching pennies, LOLA agents converge to the Nash equilibrium. In a round robin tournament we show that LOLA agents successfully shape the learning of a range of multi-agent learning algorithms from literature, resulting in the highest average returns on the IPD. We also show that the LOLA update rule can be efficiently calculated using an extension of the policy gradient estimator, making the method suitable for model-free RL. The method thus scales to large parameter and input spaces and nonlinear function approximators. We apply LOLA to a grid world task with an embedded social dilemma using recurrent policies and opponent modelling. By explicitly considering the learning of the other agent, LOLA agents learn to cooperate out of self-interest. The code is at github.com/alshedivat/lola

    Languages adapt to their contextual niche

    Get PDF

    Adults are more efficient in creating and transmitting novel signalling systems than children

    Get PDF
    Iterated language learning experiments have shown that meaningful and structured signalling systems emerge when there is pressure for signals to be both learnable and expressive. Yet such experiments have mainly been conducted with adults using language-like signals. Here we explore whether structured signalling systems can also emerge when signalling domains are unfamiliar and when the learners are children with their well-attested cognitive and pragmatic limitations. In Experiment 1, we compared iterated learning of binary auditory sequences denoting small sets of meanings in chains of adults and 5-7-year old children. Signalling systems became more learnable even though iconicity and structure did not emerge despite applying a homonymy filter designed to keep the systems expressive. When the same types of signals were used in referential communication by adult and child dyads in Experiment 2, only the adults, but not the children, were able to negotiate shared iconic and structured signals. Referential communication using their native language by 4-5-year old children in Experiment 3 showed that only interaction with adults, but not with peers resulted in informative expressions. These findings suggest that emergence and transmission of communication systems is unlikely to be driven by children, and point to the importance of cognitive maturity and pragmatic expertise of learners as well as feedback-based scaffolding of communicative effectiveness by experts during language evolution

    Cultural transmission results in convergence towards colour term universals.

    Get PDF
    As in biological evolution, multiple forces are involved in cultural evolution. One force is analogous to selection, and acts on differences in the fitness of aspects of culture by influencing who people choose to learn from. Another force is analogous to mutation, and influences how culture changes over time owing to errors in learning and the effects of cognitive biases. Which of these forces need to be appealed to in explaining any particular aspect of human cultures is an open question. We present a study that explores this question empirically, examining the role that the cognitive biases that influence cultural transmission might play in universals of colour naming. In a large-scale laboratory experiment, participants were shown labelled examples from novel artificial systems of colour terms and were asked to classify other colours on the basis of those examples. The responses of each participant were used to generate the examples seen by subsequent participants. By simulating cultural transmission in the laboratory, we were able to isolate a single evolutionary force-the effects of cognitive biases, analogous to mutation-and examine its consequences. Our results show that this process produces convergence towards systems of colour terms similar to those seen across human languages, providing support for the conclusion that the effects of cognitive biases, brought out through cultural transmission, can account for universals in colour naming

    A Theoretical Analysis of Cooperative Behavior in Multi-Agent Q-learning

    Get PDF
    A number of experimental studies have investigated whether cooperative behavior may emerge in multi-agent Q-learning. In some studies cooperative behavior did emerge, in others it did not. This report provides a theoretical analysis of this issue. The analysis focuses on multi-agent Q-learning in iterated prisoner’s dilemmas. It is shown that under certain assumptions cooperative behavior may emerge when multi-agent Q-learning is applied in an iterated prisoner’s dilemma. An important consequence of the analysis is that multi-agent Q-learning may result in non-Nash behavior. It is found experimentally that the theoretical results derived in this report are quite robust to violations of the underlying assumptions.Cooperation;Multi-Agent Q-Learning;Multi-Agent Reinforcement Learning;Nash Equilibrium;Prisoner’s Dilemma
    corecore