61,993 research outputs found
Evolution of Cooperative Hunting in Artificial Multi-layered Societies
The complexity of cooperative behavior is a crucial issue in multiagent-based
social simulation. In this paper, an agent-based model is proposed to study the
evolution of cooperative hunting behaviors in an artificial society. In this
model, the standard hunting game of stag is modified into a new situation with
social hierarchy and penalty. The agent society is divided into multiple layers
with supervisors and subordinates. In each layer, the society is divided into
multiple clusters. A supervisor controls all subordinates in a cluster locally.
Subordinates interact with rivals through reinforcement learning, and report
learning information to their corresponding supervisor. Supervisors process the
reported information through repeated affiliation-based aggregation and by
information exchange with other supervisors, then pass down the reprocessed
information to subordinates as guidance. Subordinates, in turn, update learning
information according to guidance, following the "win stay, lose shift"
strategy. Experiments are carried out to test the evolution of cooperation in
this closed-loop semi-supervised emergent system with different parameters. We
also study the variations and phase transitions in this game setting.Comment: Conflict of interest with our previous collaborators. Thus, we
retract the preprint. We retract all earlier versions of the paper as well,
but due to the arXiv policy, previous versions cannot be removed. We ask that
you ignore the abstract, earlier versions and do not refer to or distribute
them further, and we apologize for any inconvenience caused. Thank
Improving Coordination in Small-Scale Multi-Agent Deep Reinforcement Learning through Memory-driven Communication
Deep reinforcement learning algorithms have recently been used to train
multiple interacting agents in a centralised manner whilst keeping their
execution decentralised. When the agents can only acquire partial observations
and are faced with tasks requiring coordination and synchronisation skills,
inter-agent communication plays an essential role. In this work, we propose a
framework for multi-agent training using deep deterministic policy gradients
that enables concurrent, end-to-end learning of an explicit communication
protocol through a memory device. During training, the agents learn to perform
read and write operations enabling them to infer a shared representation of the
world. We empirically demonstrate that concurrent learning of the communication
device and individual policies can improve inter-agent coordination and
performance in small-scale systems. Our experimental results show that the
proposed method achieves superior performance in scenarios with up to six
agents. We illustrate how different communication patterns can emerge on six
different tasks of increasing complexity. Furthermore, we study the effects of
corrupting the communication channel, provide a visualisation of the
time-varying memory content as the underlying task is being solved and validate
the building blocks of the proposed memory device through ablation studies
Value-Decomposition Networks For Cooperative Multi-Agent Learning
We study the problem of cooperative multi-agent reinforcement learning with a
single joint reward signal. This class of learning problems is difficult
because of the often large combined action and observation spaces. In the fully
centralized and decentralized approaches, we find the problem of spurious
rewards and a phenomenon we call the "lazy agent" problem, which arises due to
partial observability. We address these problems by training individual agents
with a novel value decomposition network architecture, which learns to
decompose the team value function into agent-wise value functions. We perform
an experimental evaluation across a range of partially-observable multi-agent
domains and show that learning such value-decompositions leads to superior
results, in particular when combined with weight sharing, role information and
information channels
Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning
We propose a unified mechanism for achieving coordination and communication
in Multi-Agent Reinforcement Learning (MARL), through rewarding agents for
having causal influence over other agents' actions. Causal influence is
assessed using counterfactual reasoning. At each timestep, an agent simulates
alternate actions that it could have taken, and computes their effect on the
behavior of other agents. Actions that lead to bigger changes in other agents'
behavior are considered influential and are rewarded. We show that this is
equivalent to rewarding agents for having high mutual information between their
actions. Empirical results demonstrate that influence leads to enhanced
coordination and communication in challenging social dilemma environments,
dramatically increasing the learning curves of the deep RL agents, and leading
to more meaningful learned communication protocols. The influence rewards for
all agents can be computed in a decentralized way by enabling agents to learn a
model of other agents using deep neural networks. In contrast, key previous
works on emergent communication in the MARL setting were unable to learn
diverse policies in a decentralized manner and had to resort to centralized
training. Consequently, the influence reward opens up a window of new
opportunities for research in this area
Cooperative coevolution of real predator robots and virtual robots in the pursuit domain
The pursuit domain, or predator-prey problem is a standard testbed for the
study of coordination techniques. In spite that its problem setup is apparently
simple, it is challenging for the research of the emerged swarm intelligence.
This paper presents a particle swarm optimization (PSO) based cooperative
coevolutionary algorithm for the predator robots, called CCPSO-R, where real
and virtual robots coexist for the first time in an evolutionary algorithm
(EA). Virtual robots sample and explore the vicinity of the corresponding real
robot and act as their action spaces, while the real robots consist of the real
predators swarm who actually pursue the prey robot without fixed behavior rules
under the immediate guidance of the fitness function, which is designed in a
modular manner with very limited domain knowledges. In addition, kinematic
limits and collision avoidance considerations are integrated into the update
rules of robots. Experiments are conducted on a scalable predator robots swarm
with 4 types of preys, the statistical results of which show the reliability,
generality, and scalability of the proposed CCPSO-R. Finally, the codes of this
paper are public availabe at: https://github.com/LijunSun90/pursuitCCPSO_R
Modelling and simulation of complex systems: an approach based on multi-level agents
A complex system is made up of many components with many interactions. So the
design of systems such as simulation systems, cooperative systems or assistance
systems includes a very accurate modelling of interactional and communicational
levels. The agent-based approach provides an adapted abstraction level for this
problem. After having studied the organizational context and communicative
capacities of agentbased systems, to simulate the reorganization of a flexible
manufacturing, to regulate an urban transport system, and to simulate an
epidemic detection system, our thoughts on the interactional level were
inspired by human-machine interface models, especially those in "cognitive
engineering". To provide a general framework for agent-based complex systems
modelling, we then proposed a scale of four behaviours that agents may adopt in
their complex systems (reactive, routine, cognitive, and collective). To
complete the description of multi-level agent models, which is the focus of
this paper, we illustrate our modelling and discuss our ongoing work on each
level.Comment: 10 pages; IJCSI International Journal of Computer Science Issues,
Vol. 8, Issue 6, No 1, November 201
Modeling the Formation of Social Conventions from Embodied Real-Time Interactions
What is the role of real-time control and learning in the formation of social
conventions? To answer this question, we propose a computational model that
matches human behavioral data in a social decision-making game that was
analyzed both in discrete-time and continuous-time setups. Furthermore, unlike
previous approaches, our model takes into account the role of sensorimotor
control loops in embodied decision-making scenarios. For this purpose, we
introduce the Control-based Reinforcement Learning (CRL) model. CRL is grounded
in the Distributed Adaptive Control (DAC) theory of mind and brain, where
low-level sensorimotor control is modulated through perceptual and behavioral
learning in a layered structure. CRL follows these principles by implementing a
feedback control loop handling the agent's reactive behaviors (pre-wired
reflexes), along with an adaptive layer that uses reinforcement learning to
maximize long-term reward. We test our model in a multi-agent game-theoretic
task in which coordination must be achieved to find an optimal solution. We
show that CRL is able to reach human-level performance on standard
game-theoretic metrics such as efficiency in acquiring rewards and fairness in
reward distribution.Comment: 16 pages, 7 figure
A Framework for learning multi-agent dynamic formation strategy in real-time applications
Formation strategy is one of the most important parts of many multi-agent
systems with many applications in real world problems. In this paper, a
framework for learning this task in a limited domain (restricted environment)
is proposed. In this framework, agents learn either directly by observing an
expert behavior or indirectly by observing other agents or objects behavior.
First, a group of algorithms for learning formation strategy based on limited
features will be presented. Due to distributed and complex nature of many
multi-agent systems, it is impossible to include all features directly in the
learning process; thus, a modular scheme is proposed in order to reduce the
number of features. In this method, some important features have indirect
influence in learning instead of directly involving them as input features.
This framework has the ability to dynamically assign a group of positions to a
group of agents to improve system performance. In addition, it can change the
formation strategy when the context changes. Finally, this framework is able to
automatically produce many complex and flexible formation strategy algorithms
without directly involving an expert to present and implement such complex
algorithms.Comment: 27 pages, 9 figure
ES-CTC: A Deep Neuroevolution Model for Cooperative Intelligent Freeway Traffic Control
Cooperative intelligent freeway traffic control is an important application
in intelligent transportation systems, which is expected to improve the
mobility of freeway networks. In this paper, we propose a deep neuroevolution
model, called ES-CTC, to achieve a cooperative control scheme of ramp metering,
differential variable speed limits and lane change control agents for improving
freeway traffic. In this model, the graph convolutional networks are used to
learn more meaningful spatial pattern from traffic sensors, a knowledge sharing
layer is designed for communication between different agents. The proposed
neural networks structure allows different agents share knowledge with each
other and execute action asynchronously. In order to address the delayed reward
and action asynchronism issues, the evolutionary strategy is utilized to train
the agents under stochastic traffic demands. The experimental results on a
simulated freeway section indicate that ES-CTC is a viable approach and
outperforms several existing methodsComment: 7 page
How individuals learn to take turns: Emergence of alternating cooperation in a congestion game and the prisoner's dilemma
In many social dilemmas, individuals tend to generate a situation with low
payoffs instead of a system optimum ("tragedy of the commons"). Is the routing
of traffic a similar problem? In order to address this question, we present
experimental results on humans playing a route choice game in a computer
laboratory, which allow one to study decision behavior in repeated games beyond
the Prisoner's Dilemma. We will focus on whether individuals manage to find a
cooperative and fair solution compatible with the system-optimal road usage. We
find that individuals tend towards a user equilibrium with equal travel times
in the beginning. However, after many iterations, they often establish a
coherent oscillatory behavior, as taking turns performs better than applying
pure or mixed strategies. The resulting behavior is fair and compatible with
system-optimal road usage. In spite of the complex dynamics leading to
coordinated oscillations, we have identified mathematical relationships
quantifying the observed transition process. Our main experimental discoveries
for 2- and 4-person games can be explained with a novel reinforcement learning
model for an arbitrary number of persons, which is based on past experience and
trial-and-error behavior. Gains in the average payoff seem to be an important
driving force for the innovation of time-dependent response patterns, i.e. the
evolution of more complex strategies. Our findings are relevant for decision
support systems and routing in traffic or data networks.Comment: For related work see http://www.helbing.or
- …