2,041 research outputs found

    Partner Selection for the Emergence of Cooperation in Multi-Agent Systems Using Reinforcement Learning

    Get PDF
    Social dilemmas have been widely studied to explain how humans are able to cooperate in society. Considerable effort has been invested in designing artificial agents for social dilemmas that incorporate explicit agent motivations that are chosen to favor coordinated or cooperative responses. The prevalence of this general approach points towards the importance of achieving an understanding of both an agent's internal design and external environment dynamics that facilitate cooperative behavior. In this paper, we investigate how partner selection can promote cooperative behavior between agents who are trained to maximize a purely selfish objective function. Our experiments reveal that agents trained with this dynamic learn a strategy that retaliates against defectors while promoting cooperation with other agents resulting in a prosocial society.Comment:

    Learning with Opponent-Learning Awareness

    Full text link
    Multi-agent settings are quickly gathering importance in machine learning. This includes a plethora of recent work on deep multi-agent reinforcement learning, but also can be extended to hierarchical RL, generative adversarial networks and decentralised optimisation. In all these settings the presence of multiple learning agents renders the training problem non-stationary and often leads to unstable training or undesired final results. We present Learning with Opponent-Learning Awareness (LOLA), a method in which each agent shapes the anticipated learning of the other agents in the environment. The LOLA learning rule includes a term that accounts for the impact of one agent's policy on the anticipated parameter update of the other agents. Results show that the encounter of two LOLA agents leads to the emergence of tit-for-tat and therefore cooperation in the iterated prisoners' dilemma, while independent learning does not. In this domain, LOLA also receives higher payouts compared to a naive learner, and is robust against exploitation by higher order gradient-based methods. Applied to repeated matching pennies, LOLA agents converge to the Nash equilibrium. In a round robin tournament we show that LOLA agents successfully shape the learning of a range of multi-agent learning algorithms from literature, resulting in the highest average returns on the IPD. We also show that the LOLA update rule can be efficiently calculated using an extension of the policy gradient estimator, making the method suitable for model-free RL. The method thus scales to large parameter and input spaces and nonlinear function approximators. We apply LOLA to a grid world task with an embedded social dilemma using recurrent policies and opponent modelling. By explicitly considering the learning of the other agent, LOLA agents learn to cooperate out of self-interest. The code is at github.com/alshedivat/lola

    Multiagent Bidirectionally-Coordinated Nets: Emergence of Human-level Coordination in Learning to Play StarCraft Combat Games

    Get PDF
    Many artificial intelligence (AI) applications often require multiple intelligent agents to work in a collaborative effort. Efficient learning for intra-agent communication and coordination is an indispensable step towards general AI. In this paper, we take StarCraft combat game as a case study, where the task is to coordinate multiple agents as a team to defeat their enemies. To maintain a scalable yet effective communication protocol, we introduce a Multiagent Bidirectionally-Coordinated Network (BiCNet ['bIknet]) with a vectorised extension of actor-critic formulation. We show that BiCNet can handle different types of combats with arbitrary numbers of AI agents for both sides. Our analysis demonstrates that without any supervisions such as human demonstrations or labelled data, BiCNet could learn various types of advanced coordination strategies that have been commonly used by experienced game players. In our experiments, we evaluate our approach against multiple baselines under different scenarios; it shows state-of-the-art performance, and possesses potential values for large-scale real-world applications.Comment: 10 pages, 10 figures. Previously as title: "Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games", Mar 201

    Social interaction for efficient agent learning from human reward

    Get PDF
    Abstract - Learning from rewards generated by a human trainer observing an agent in action has been proven to be a powerful method for teaching autonomous agents to perform challenging tasks, especially for those non-technical users. Since the efficacy of this approach depends critically on the reward the trainer provides, we consider how the interaction between the trainer and the agent should be designed so as to increase the efficiency of the training process. This article investigates the influence of the agent’s socio-competitive feedback on the human trainer’s training behavior and the agent’s learning. The results of our user study with 85 participants suggest that the agent’s passive socio-competitive feedback—showing performance and score of agents trained by trainers in a leaderboard—substantially increases the engagement of the participants in the game task and improves the agents’ performance, even though the participants do not directly play the game but instead train the agent to do so. Moreover, making this feedback active—sending the trainer her agent’s performance relative to others—further induces more participants to train agents longer and improves the agent’s learning. Our further analysis shows that agents trained by trainers affected by both the passive and active social feedback could obtain a higher performance under a score mechanism that could be optimized from the trainer’s perspective and the agent’s additional active social feedback can keep participants to further train agents to learn policies that can obtain a higher performance under such a score mechanism.Fundamental Research Funds for the Central Universities of China (Grant No. 841713015)China Postdoctoral Science Foundatio

    ACoPla: a Multiagent Simulator to Study Individual Strategies in Dynamic Situations

    Get PDF
    One important issue in multi-agent systems is how to define agents’ interaction strategies in dynamic open environments. Generally, agents’ behaviors, such as being cooperative/altruistic or competitive/adversarial, are defined a priori by their creators. However, this is a weak premise when considering interaction among anonymous self-interested agents. Whenever agents meet, there is always a decision to be made: what is the best group interaction strategy? We argue that the answer depends on the amount of information required to make a decision and on the deadline proximity for accomplishing the task in hand. In certain situations, it is to the agents’ advantage to exchange information with others, while in other situations there are no incentives for them to spend time doing so. Understanding effective behaviors according to the decision- making scenario is still an open issue in multi-agent systems. In this paper, we present a multi-agent simulator (ACoPla) to understand the correlations between agents’ interaction strategy, decision-making context and successful task accomplishment rate. Additionally, we develop a case study in the domain of site evacuation to exemplify our findings. Through this study, we detect the types of conditions under which cooperation becomes the preferred strategy, as the environment changes

    Multiagent systems: games and learning from structures

    Get PDF
    Multiple agents have become increasingly utilized in various fields for both physical robots and software agents, such as search and rescue robots, automated driving, auctions and electronic commerce agents, and so on. In multiagent domains, agents interact and coadapt with other agents. Each agent's choice of policy depends on the others' joint policy to achieve the best available performance. During this process, the environment evolves and is no longer stationary, where each agent adapts to proceed towards its target. Each micro-level step in time may present a different learning problem which needs to be addressed. However, in this non-stationary environment, a holistic phenomenon forms along with the rational strategies of all players; we define this phenomenon as structural properties. In our research, we present the importance of analyzing the structural properties, and how to extract the structural properties in multiagent environments. According to the agents' objectives, a multiagent environment can be classified as self-interested, cooperative, or competitive. We examine the structure from these three general multiagent environments: self-interested random graphical game playing, distributed cooperative team playing, and competitive group survival. In each scenario, we analyze the structure in each environmental setting, and demonstrate the structure learned as a comprehensive representation: structure of players' action influence, structure of constraints in teamwork communication, and structure of inter-connections among strategies. This structure represents macro-level knowledge arising in a multiagent system, and provides critical, holistic information for each problem domain. Last, we present some open issues and point toward future research

    Multiagent Deep Reinforcement Learning: Challenges and Directions Towards Human-Like Approaches

    Full text link
    This paper surveys the field of multiagent deep reinforcement learning. The combination of deep neural networks with reinforcement learning has gained increased traction in recent years and is slowly shifting the focus from single-agent to multiagent environments. Dealing with multiple agents is inherently more complex as (a) the future rewards depend on the joint actions of multiple players and (b) the computational complexity of functions increases. We present the most common multiagent problem representations and their main challenges, and identify five research areas that address one or more of these challenges: centralised training and decentralised execution, opponent modelling, communication, efficient coordination, and reward shaping. We find that many computational studies rely on unrealistic assumptions or are not generalisable to other settings; they struggle to overcome the curse of dimensionality or nonstationarity. Approaches from psychology and sociology capture promising relevant behaviours such as communication and coordination. We suggest that, for multiagent reinforcement learning to be successful, future research addresses these challenges with an interdisciplinary approach to open up new possibilities for more human-oriented solutions in multiagent reinforcement learning.Comment: 37 pages, 6 figure

    From supply chains to demand networks. Agents in retailing: the electrical bazaar

    Get PDF
    A paradigm shift is taking place in logistics. The focus is changing from operational effectiveness to adaptation. Supply Chains will develop into networks that will adapt to consumer demand in almost real time. Time to market, capacity of adaptation and enrichment of customer experience seem to be the key elements of this new paradigm. In this environment emerging technologies like RFID (Radio Frequency ID), Intelligent Products and the Internet, are triggering a reconsideration of methods, procedures and goals. We present a Multiagent System framework specialized in retail that addresses these changes with the use of rational agents and takes advantages of the new market opportunities. Like in an old bazaar, agents able to learn, cooperate, take advantage of gossip and distinguish between collaborators and competitors, have the ability to adapt, learn and react to a changing environment better than any other structure. Keywords: Supply Chains, Distributed Artificial Intelligence, Multiagent System.Postprint (published version
    • …
    corecore