2,041 research outputs found
Partner Selection for the Emergence of Cooperation in Multi-Agent Systems Using Reinforcement Learning
Social dilemmas have been widely studied to explain how humans are able to
cooperate in society. Considerable effort has been invested in designing
artificial agents for social dilemmas that incorporate explicit agent
motivations that are chosen to favor coordinated or cooperative responses. The
prevalence of this general approach points towards the importance of achieving
an understanding of both an agent's internal design and external environment
dynamics that facilitate cooperative behavior. In this paper, we investigate
how partner selection can promote cooperative behavior between agents who are
trained to maximize a purely selfish objective function. Our experiments reveal
that agents trained with this dynamic learn a strategy that retaliates against
defectors while promoting cooperation with other agents resulting in a
prosocial society.Comment:
Learning with Opponent-Learning Awareness
Multi-agent settings are quickly gathering importance in machine learning.
This includes a plethora of recent work on deep multi-agent reinforcement
learning, but also can be extended to hierarchical RL, generative adversarial
networks and decentralised optimisation. In all these settings the presence of
multiple learning agents renders the training problem non-stationary and often
leads to unstable training or undesired final results. We present Learning with
Opponent-Learning Awareness (LOLA), a method in which each agent shapes the
anticipated learning of the other agents in the environment. The LOLA learning
rule includes a term that accounts for the impact of one agent's policy on the
anticipated parameter update of the other agents. Results show that the
encounter of two LOLA agents leads to the emergence of tit-for-tat and
therefore cooperation in the iterated prisoners' dilemma, while independent
learning does not. In this domain, LOLA also receives higher payouts compared
to a naive learner, and is robust against exploitation by higher order
gradient-based methods. Applied to repeated matching pennies, LOLA agents
converge to the Nash equilibrium. In a round robin tournament we show that LOLA
agents successfully shape the learning of a range of multi-agent learning
algorithms from literature, resulting in the highest average returns on the
IPD. We also show that the LOLA update rule can be efficiently calculated using
an extension of the policy gradient estimator, making the method suitable for
model-free RL. The method thus scales to large parameter and input spaces and
nonlinear function approximators. We apply LOLA to a grid world task with an
embedded social dilemma using recurrent policies and opponent modelling. By
explicitly considering the learning of the other agent, LOLA agents learn to
cooperate out of self-interest. The code is at github.com/alshedivat/lola
Multiagent Bidirectionally-Coordinated Nets: Emergence of Human-level Coordination in Learning to Play StarCraft Combat Games
Many artificial intelligence (AI) applications often require multiple
intelligent agents to work in a collaborative effort. Efficient learning for
intra-agent communication and coordination is an indispensable step towards
general AI. In this paper, we take StarCraft combat game as a case study, where
the task is to coordinate multiple agents as a team to defeat their enemies. To
maintain a scalable yet effective communication protocol, we introduce a
Multiagent Bidirectionally-Coordinated Network (BiCNet ['bIknet]) with a
vectorised extension of actor-critic formulation. We show that BiCNet can
handle different types of combats with arbitrary numbers of AI agents for both
sides. Our analysis demonstrates that without any supervisions such as human
demonstrations or labelled data, BiCNet could learn various types of advanced
coordination strategies that have been commonly used by experienced game
players. In our experiments, we evaluate our approach against multiple
baselines under different scenarios; it shows state-of-the-art performance, and
possesses potential values for large-scale real-world applications.Comment: 10 pages, 10 figures. Previously as title: "Multiagent
Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat
Games", Mar 201
Social interaction for efficient agent learning from human reward
Abstract - Learning from rewards generated by a human trainer observing an agent in action has been proven to be a powerful method for teaching autonomous agents to perform challenging tasks, especially for those non-technical users. Since the efficacy of this approach depends critically on the reward the trainer provides, we consider how the interaction between the trainer and the agent should be designed so as to increase the efficiency of the training process. This article investigates the influence of the agent’s socio-competitive feedback on the human trainer’s training behavior and the agent’s learning. The results of our user study with 85 participants suggest that the agent’s passive socio-competitive feedback—showing performance and score of agents trained by trainers in a leaderboard—substantially increases the engagement of the participants in the game task and improves the agents’ performance, even though the participants do not directly play the game but instead train the agent to do so. Moreover, making this feedback active—sending the trainer her agent’s performance relative to others—further induces more participants to train agents longer and improves the agent’s learning. Our further analysis shows that agents trained by trainers affected by both the passive and active social feedback could obtain a higher performance under a score mechanism that could be optimized from the trainer’s perspective and the agent’s additional active social feedback can keep participants to further train agents to learn policies that can obtain a higher performance under such a score mechanism.Fundamental Research Funds for the Central Universities of China (Grant No. 841713015)China Postdoctoral Science Foundatio
ACoPla: a Multiagent Simulator to Study Individual Strategies in Dynamic Situations
One important issue in multi-agent systems is how to define agents’ interaction strategies in dynamic open environments. Generally, agents’ behaviors, such as being cooperative/altruistic or competitive/adversarial, are defined a priori by their creators. However, this is a weak premise when considering interaction among anonymous self-interested agents. Whenever agents meet, there is always a decision to be made: what is the best group interaction strategy? We argue that the answer depends on the amount of information required to make a decision and on the deadline proximity for accomplishing the task in hand. In certain situations, it is to the agents’ advantage to exchange information with others, while in other situations there are no incentives for them to spend time doing so. Understanding effective behaviors according to the decision- making scenario is still an open issue in multi-agent systems. In this paper, we present a multi-agent simulator (ACoPla) to understand the correlations between agents’ interaction strategy, decision-making context and successful task accomplishment rate. Additionally, we develop a case study in the domain of site evacuation to exemplify our findings. Through this study, we detect the types of conditions under which cooperation becomes the preferred strategy, as the environment changes
Multiagent systems: games and learning from structures
Multiple agents have become increasingly utilized in various fields for both physical robots and software agents, such as search and rescue robots, automated driving, auctions and electronic commerce agents, and so on. In multiagent domains, agents interact and coadapt with other agents. Each agent's choice of policy depends on the others' joint policy to achieve the best available performance. During this process, the environment evolves and is no longer stationary, where each agent adapts to proceed towards its target. Each micro-level step in time may present a different learning problem which needs to be addressed. However, in this non-stationary environment, a holistic phenomenon forms along with the rational strategies of all players; we define this phenomenon as structural properties.
In our research, we present the importance of analyzing the structural properties, and how to extract the structural properties in multiagent environments. According to the agents' objectives, a multiagent environment can be classified as self-interested, cooperative, or competitive. We examine the structure from these three general multiagent environments: self-interested random graphical game playing, distributed cooperative team playing, and competitive group survival. In each scenario, we analyze the structure in each environmental setting, and demonstrate the structure learned as a comprehensive representation: structure of players' action influence, structure of constraints in teamwork communication, and structure of inter-connections among strategies. This structure represents macro-level knowledge arising in a multiagent system, and provides critical, holistic information for each problem domain. Last, we present some open issues and point toward future research
Multiagent Deep Reinforcement Learning: Challenges and Directions Towards Human-Like Approaches
This paper surveys the field of multiagent deep reinforcement learning. The
combination of deep neural networks with reinforcement learning has gained
increased traction in recent years and is slowly shifting the focus from
single-agent to multiagent environments. Dealing with multiple agents is
inherently more complex as (a) the future rewards depend on the joint actions
of multiple players and (b) the computational complexity of functions
increases. We present the most common multiagent problem representations and
their main challenges, and identify five research areas that address one or
more of these challenges: centralised training and decentralised execution,
opponent modelling, communication, efficient coordination, and reward shaping.
We find that many computational studies rely on unrealistic assumptions or are
not generalisable to other settings; they struggle to overcome the curse of
dimensionality or nonstationarity. Approaches from psychology and sociology
capture promising relevant behaviours such as communication and coordination.
We suggest that, for multiagent reinforcement learning to be successful, future
research addresses these challenges with an interdisciplinary approach to open
up new possibilities for more human-oriented solutions in multiagent
reinforcement learning.Comment: 37 pages, 6 figure
From supply chains to demand networks. Agents in retailing: the electrical bazaar
A paradigm shift is taking place in logistics. The focus is changing from operational effectiveness to adaptation. Supply Chains will develop into networks that will adapt to consumer demand in almost real time. Time to market, capacity of adaptation and enrichment of customer experience seem to be the key elements of this new paradigm. In this environment emerging technologies like RFID (Radio Frequency ID), Intelligent Products and the Internet, are triggering a reconsideration of methods, procedures and goals. We present a Multiagent System framework specialized in retail that addresses these changes with the use of rational agents and takes advantages of the new market opportunities. Like in an old bazaar, agents able to learn, cooperate, take advantage of gossip and distinguish between collaborators and competitors, have the ability to adapt, learn and react to a changing environment better than any other structure. Keywords: Supply Chains, Distributed Artificial Intelligence, Multiagent System.Postprint (published version
- …