61,993 research outputs found

    Evolution of Cooperative Hunting in Artificial Multi-layered Societies

    Full text link
    The complexity of cooperative behavior is a crucial issue in multiagent-based social simulation. In this paper, an agent-based model is proposed to study the evolution of cooperative hunting behaviors in an artificial society. In this model, the standard hunting game of stag is modified into a new situation with social hierarchy and penalty. The agent society is divided into multiple layers with supervisors and subordinates. In each layer, the society is divided into multiple clusters. A supervisor controls all subordinates in a cluster locally. Subordinates interact with rivals through reinforcement learning, and report learning information to their corresponding supervisor. Supervisors process the reported information through repeated affiliation-based aggregation and by information exchange with other supervisors, then pass down the reprocessed information to subordinates as guidance. Subordinates, in turn, update learning information according to guidance, following the "win stay, lose shift" strategy. Experiments are carried out to test the evolution of cooperation in this closed-loop semi-supervised emergent system with different parameters. We also study the variations and phase transitions in this game setting.Comment: Conflict of interest with our previous collaborators. Thus, we retract the preprint. We retract all earlier versions of the paper as well, but due to the arXiv policy, previous versions cannot be removed. We ask that you ignore the abstract, earlier versions and do not refer to or distribute them further, and we apologize for any inconvenience caused. Thank

    Improving Coordination in Small-Scale Multi-Agent Deep Reinforcement Learning through Memory-driven Communication

    Full text link
    Deep reinforcement learning algorithms have recently been used to train multiple interacting agents in a centralised manner whilst keeping their execution decentralised. When the agents can only acquire partial observations and are faced with tasks requiring coordination and synchronisation skills, inter-agent communication plays an essential role. In this work, we propose a framework for multi-agent training using deep deterministic policy gradients that enables concurrent, end-to-end learning of an explicit communication protocol through a memory device. During training, the agents learn to perform read and write operations enabling them to infer a shared representation of the world. We empirically demonstrate that concurrent learning of the communication device and individual policies can improve inter-agent coordination and performance in small-scale systems. Our experimental results show that the proposed method achieves superior performance in scenarios with up to six agents. We illustrate how different communication patterns can emerge on six different tasks of increasing complexity. Furthermore, we study the effects of corrupting the communication channel, provide a visualisation of the time-varying memory content as the underlying task is being solved and validate the building blocks of the proposed memory device through ablation studies

    Value-Decomposition Networks For Cooperative Multi-Agent Learning

    Full text link
    We study the problem of cooperative multi-agent reinforcement learning with a single joint reward signal. This class of learning problems is difficult because of the often large combined action and observation spaces. In the fully centralized and decentralized approaches, we find the problem of spurious rewards and a phenomenon we call the "lazy agent" problem, which arises due to partial observability. We address these problems by training individual agents with a novel value decomposition network architecture, which learns to decompose the team value function into agent-wise value functions. We perform an experimental evaluation across a range of partially-observable multi-agent domains and show that learning such value-decompositions leads to superior results, in particular when combined with weight sharing, role information and information channels

    Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning

    Full text link
    We propose a unified mechanism for achieving coordination and communication in Multi-Agent Reinforcement Learning (MARL), through rewarding agents for having causal influence over other agents' actions. Causal influence is assessed using counterfactual reasoning. At each timestep, an agent simulates alternate actions that it could have taken, and computes their effect on the behavior of other agents. Actions that lead to bigger changes in other agents' behavior are considered influential and are rewarded. We show that this is equivalent to rewarding agents for having high mutual information between their actions. Empirical results demonstrate that influence leads to enhanced coordination and communication in challenging social dilemma environments, dramatically increasing the learning curves of the deep RL agents, and leading to more meaningful learned communication protocols. The influence rewards for all agents can be computed in a decentralized way by enabling agents to learn a model of other agents using deep neural networks. In contrast, key previous works on emergent communication in the MARL setting were unable to learn diverse policies in a decentralized manner and had to resort to centralized training. Consequently, the influence reward opens up a window of new opportunities for research in this area

    Cooperative coevolution of real predator robots and virtual robots in the pursuit domain

    Full text link
    The pursuit domain, or predator-prey problem is a standard testbed for the study of coordination techniques. In spite that its problem setup is apparently simple, it is challenging for the research of the emerged swarm intelligence. This paper presents a particle swarm optimization (PSO) based cooperative coevolutionary algorithm for the predator robots, called CCPSO-R, where real and virtual robots coexist for the first time in an evolutionary algorithm (EA). Virtual robots sample and explore the vicinity of the corresponding real robot and act as their action spaces, while the real robots consist of the real predators swarm who actually pursue the prey robot without fixed behavior rules under the immediate guidance of the fitness function, which is designed in a modular manner with very limited domain knowledges. In addition, kinematic limits and collision avoidance considerations are integrated into the update rules of robots. Experiments are conducted on a scalable predator robots swarm with 4 types of preys, the statistical results of which show the reliability, generality, and scalability of the proposed CCPSO-R. Finally, the codes of this paper are public availabe at: https://github.com/LijunSun90/pursuitCCPSO_R

    Modelling and simulation of complex systems: an approach based on multi-level agents

    Full text link
    A complex system is made up of many components with many interactions. So the design of systems such as simulation systems, cooperative systems or assistance systems includes a very accurate modelling of interactional and communicational levels. The agent-based approach provides an adapted abstraction level for this problem. After having studied the organizational context and communicative capacities of agentbased systems, to simulate the reorganization of a flexible manufacturing, to regulate an urban transport system, and to simulate an epidemic detection system, our thoughts on the interactional level were inspired by human-machine interface models, especially those in "cognitive engineering". To provide a general framework for agent-based complex systems modelling, we then proposed a scale of four behaviours that agents may adopt in their complex systems (reactive, routine, cognitive, and collective). To complete the description of multi-level agent models, which is the focus of this paper, we illustrate our modelling and discuss our ongoing work on each level.Comment: 10 pages; IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 6, No 1, November 201

    Modeling the Formation of Social Conventions from Embodied Real-Time Interactions

    Full text link
    What is the role of real-time control and learning in the formation of social conventions? To answer this question, we propose a computational model that matches human behavioral data in a social decision-making game that was analyzed both in discrete-time and continuous-time setups. Furthermore, unlike previous approaches, our model takes into account the role of sensorimotor control loops in embodied decision-making scenarios. For this purpose, we introduce the Control-based Reinforcement Learning (CRL) model. CRL is grounded in the Distributed Adaptive Control (DAC) theory of mind and brain, where low-level sensorimotor control is modulated through perceptual and behavioral learning in a layered structure. CRL follows these principles by implementing a feedback control loop handling the agent's reactive behaviors (pre-wired reflexes), along with an adaptive layer that uses reinforcement learning to maximize long-term reward. We test our model in a multi-agent game-theoretic task in which coordination must be achieved to find an optimal solution. We show that CRL is able to reach human-level performance on standard game-theoretic metrics such as efficiency in acquiring rewards and fairness in reward distribution.Comment: 16 pages, 7 figure

    A Framework for learning multi-agent dynamic formation strategy in real-time applications

    Full text link
    Formation strategy is one of the most important parts of many multi-agent systems with many applications in real world problems. In this paper, a framework for learning this task in a limited domain (restricted environment) is proposed. In this framework, agents learn either directly by observing an expert behavior or indirectly by observing other agents or objects behavior. First, a group of algorithms for learning formation strategy based on limited features will be presented. Due to distributed and complex nature of many multi-agent systems, it is impossible to include all features directly in the learning process; thus, a modular scheme is proposed in order to reduce the number of features. In this method, some important features have indirect influence in learning instead of directly involving them as input features. This framework has the ability to dynamically assign a group of positions to a group of agents to improve system performance. In addition, it can change the formation strategy when the context changes. Finally, this framework is able to automatically produce many complex and flexible formation strategy algorithms without directly involving an expert to present and implement such complex algorithms.Comment: 27 pages, 9 figure

    ES-CTC: A Deep Neuroevolution Model for Cooperative Intelligent Freeway Traffic Control

    Full text link
    Cooperative intelligent freeway traffic control is an important application in intelligent transportation systems, which is expected to improve the mobility of freeway networks. In this paper, we propose a deep neuroevolution model, called ES-CTC, to achieve a cooperative control scheme of ramp metering, differential variable speed limits and lane change control agents for improving freeway traffic. In this model, the graph convolutional networks are used to learn more meaningful spatial pattern from traffic sensors, a knowledge sharing layer is designed for communication between different agents. The proposed neural networks structure allows different agents share knowledge with each other and execute action asynchronously. In order to address the delayed reward and action asynchronism issues, the evolutionary strategy is utilized to train the agents under stochastic traffic demands. The experimental results on a simulated freeway section indicate that ES-CTC is a viable approach and outperforms several existing methodsComment: 7 page

    How individuals learn to take turns: Emergence of alternating cooperation in a congestion game and the prisoner's dilemma

    Full text link
    In many social dilemmas, individuals tend to generate a situation with low payoffs instead of a system optimum ("tragedy of the commons"). Is the routing of traffic a similar problem? In order to address this question, we present experimental results on humans playing a route choice game in a computer laboratory, which allow one to study decision behavior in repeated games beyond the Prisoner's Dilemma. We will focus on whether individuals manage to find a cooperative and fair solution compatible with the system-optimal road usage. We find that individuals tend towards a user equilibrium with equal travel times in the beginning. However, after many iterations, they often establish a coherent oscillatory behavior, as taking turns performs better than applying pure or mixed strategies. The resulting behavior is fair and compatible with system-optimal road usage. In spite of the complex dynamics leading to coordinated oscillations, we have identified mathematical relationships quantifying the observed transition process. Our main experimental discoveries for 2- and 4-person games can be explained with a novel reinforcement learning model for an arbitrary number of persons, which is based on past experience and trial-and-error behavior. Gains in the average payoff seem to be an important driving force for the innovation of time-dependent response patterns, i.e. the evolution of more complex strategies. Our findings are relevant for decision support systems and routing in traffic or data networks.Comment: For related work see http://www.helbing.or
    • …
    corecore