74,740 research outputs found

    Learning with Opponent-Learning Awareness

    Full text link
    Multi-agent settings are quickly gathering importance in machine learning. This includes a plethora of recent work on deep multi-agent reinforcement learning, but also can be extended to hierarchical RL, generative adversarial networks and decentralised optimisation. In all these settings the presence of multiple learning agents renders the training problem non-stationary and often leads to unstable training or undesired final results. We present Learning with Opponent-Learning Awareness (LOLA), a method in which each agent shapes the anticipated learning of the other agents in the environment. The LOLA learning rule includes a term that accounts for the impact of one agent's policy on the anticipated parameter update of the other agents. Results show that the encounter of two LOLA agents leads to the emergence of tit-for-tat and therefore cooperation in the iterated prisoners' dilemma, while independent learning does not. In this domain, LOLA also receives higher payouts compared to a naive learner, and is robust against exploitation by higher order gradient-based methods. Applied to repeated matching pennies, LOLA agents converge to the Nash equilibrium. In a round robin tournament we show that LOLA agents successfully shape the learning of a range of multi-agent learning algorithms from literature, resulting in the highest average returns on the IPD. We also show that the LOLA update rule can be efficiently calculated using an extension of the policy gradient estimator, making the method suitable for model-free RL. The method thus scales to large parameter and input spaces and nonlinear function approximators. We apply LOLA to a grid world task with an embedded social dilemma using recurrent policies and opponent modelling. By explicitly considering the learning of the other agent, LOLA agents learn to cooperate out of self-interest. The code is at github.com/alshedivat/lola

    Opponent Learning Awareness and Modelling in Multi-Objective Normal Form Games

    Full text link
    Many real-world multi-agent interactions consider multiple distinct criteria, i.e. the payoffs are multi-objective in nature. However, the same multi-objective payoff vector may lead to different utilities for each participant. Therefore, it is essential for an agent to learn about the behaviour of other agents in the system. In this work, we present the first study of the effects of such opponent modelling on multi-objective multi-agent interactions with non-linear utilities. Specifically, we consider two-player multi-objective normal form games with non-linear utility functions under the scalarised expected returns optimisation criterion. We contribute novel actor-critic and policy gradient formulations to allow reinforcement learning of mixed strategies in this setting, along with extensions that incorporate opponent policy reconstruction and learning with opponent learning awareness (i.e., learning while considering the impact of one's policy when anticipating the opponent's learning step). Empirical results in five different MONFGs demonstrate that opponent learning awareness and modelling can drastically alter the learning dynamics in this setting. When equilibria are present, opponent modelling can confer significant benefits on agents that implement it. When there are no Nash equilibria, opponent learning awareness and modelling allows agents to still converge to meaningful solutions that approximate equilibria.Comment: Under review since 14 November 202

    Developing Pre-Service Teachers’ Evidence-Based Argumentation skills on Socio-Scientific Issues

    Get PDF
    We report on a study of the effect of meta-level awareness on the use of evidence in discourse. The participants were 66 pre-service teachers who were engaged in a dialogic activity. Meta-level awareness regarding the use of evidence in discourse was heightened by having same-side peers collaborating in arguing on the computer against successive pairs of peers on the opposing side of an issue on the topic of Climate Change and by engaging in explicit reflective activities on the use of evidence. Participants showed significant advances both in their skill of producing evidence-based arguments and counterarguments and regarding the accuracy of the evidence used. Advances were also observed at the meta-level, reflecting at least implicit understanding that using evidence is an important goal of argumentation. Another group of pre-service teachers, who studied about the role of evidence in science in the context of regular curriculum and served as a control condition, did not exhibit comparable advances in the use of evidence in argumentation. Educational implications are discussed

    From normal brain and behavior to schzophrenia

    Full text link
    Air Force Office of Scientific Research (F49620-01-1-0397); Office of Naval Research (N00014-01-1-0624
    corecore