2,785 research outputs found
Guided Deep Reinforcement Learning for Swarm Systems
In this paper, we investigate how to learn to control a group of cooperative
agents with limited sensing capabilities such as robot swarms. The agents have
only very basic sensor capabilities, yet in a group they can accomplish
sophisticated tasks, such as distributed assembly or search and rescue tasks.
Learning a policy for a group of agents is difficult due to distributed partial
observability of the state. Here, we follow a guided approach where a critic
has central access to the global state during learning, which simplifies the
policy evaluation problem from a reinforcement learning point of view. For
example, we can get the positions of all robots of the swarm using a camera
image of a scene. This camera image is only available to the critic and not to
the control policies of the robots. We follow an actor-critic approach, where
the actors base their decisions only on locally sensed information. In
contrast, the critic is learned based on the true global state. Our algorithm
uses deep reinforcement learning to approximate both the Q-function and the
policy. The performance of the algorithm is evaluated on two tasks with simple
simulated 2D agents: 1) finding and maintaining a certain distance to each
others and 2) locating a target.Comment: 15 pages, 8 figures, accepted at the AAMAS 2017 Autonomous Robots and
Multirobot Systems (ARMS) Worksho
Real-time Learning and Planning in Environments with Swarms:A Hierarchical and a Parameter-based Simulation Approach
Swarms can be applied in many relevant domains, such as patrolling or rescue. They usually follow simple local rules, leading to complex emergent behavior. Given their wide applicability, an agent may need to take decisions in an environment containing a swarm that is not under its control, and that may even be an antagonist. Predicting the behavior of each swarm member is a great challenge, and must be done under real time constraints, since they usually move constantly following quick reactive algorithms. We propose the first two solutions for this novel problem, showing integrated on-line learning and planning for decision-making with unknown swarms: (i) we learn an ellipse abstraction of the swarm based on statistical models, and predict its future parameters using time-series; (ii) we learn algorithm parameters followed by each swarm member, in order to directly simulate them. We find in our experiments that we are significantly faster to reach an objective than local repulsive forces, at the cost of success rate in some situations. Additionally, we show that this is a challenging problem for reinforcement learning
Deep Reinforcement Learning for Swarm Systems
Recently, deep reinforcement learning (RL) methods have been applied
successfully to multi-agent scenarios. Typically, these methods rely on a
concatenation of agent states to represent the information content required for
decentralized decision making. However, concatenation scales poorly to swarm
systems with a large number of homogeneous agents as it does not exploit the
fundamental properties inherent to these systems: (i) the agents in the swarm
are interchangeable and (ii) the exact number of agents in the swarm is
irrelevant. Therefore, we propose a new state representation for deep
multi-agent RL based on mean embeddings of distributions. We treat the agents
as samples of a distribution and use the empirical mean embedding as input for
a decentralized policy. We define different feature spaces of the mean
embedding using histograms, radial basis functions and a neural network learned
end-to-end. We evaluate the representation on two well known problems from the
swarm literature (rendezvous and pursuit evasion), in a globally and locally
observable setup. For the local setup we furthermore introduce simple
communication protocols. Of all approaches, the mean embedding representation
using neural network features enables the richest information exchange between
neighboring agents facilitating the development of more complex collective
strategies.Comment: 31 pages, 12 figures, version 3 (published in JMLR Volume 20
Cost Adaptation for Robust Decentralized Swarm Behaviour
Decentralized receding horizon control (D-RHC) provides a mechanism for
coordination in multi-agent settings without a centralized command center.
However, combining a set of different goals, costs, and constraints to form an
efficient optimization objective for D-RHC can be difficult. To allay this
problem, we use a meta-learning process -- cost adaptation -- which generates
the optimization objective for D-RHC to solve based on a set of human-generated
priors (cost and constraint functions) and an auxiliary heuristic. We use this
adaptive D-RHC method for control of mesh-networked swarm agents. This
formulation allows a wide range of tasks to be encoded and can account for
network delays, heterogeneous capabilities, and increasingly large swarms
through the adaptation mechanism. We leverage the Unity3D game engine to build
a simulator capable of introducing artificial networking failures and delays in
the swarm. Using the simulator we validate our method on an example coordinated
exploration task. We demonstrate that cost adaptation allows for more efficient
and safer task completion under varying environment conditions and increasingly
large swarm sizes. We release our simulator and code to the community for
future work.Comment: Accepted to IEEE/RSJ International Conference on Intelligent Robots
and Systems (IROS), 201
Local Communication Protocols for Learning Complex Swarm Behaviors with Deep Reinforcement Learning
Swarm systems constitute a challenging problem for reinforcement learning
(RL) as the algorithm needs to learn decentralized control policies that can
cope with limited local sensing and communication abilities of the agents.
While it is often difficult to directly define the behavior of the agents,
simple communication protocols can be defined more easily using prior knowledge
about the given task. In this paper, we propose a number of simple
communication protocols that can be exploited by deep reinforcement learning to
find decentralized control policies in a multi-robot swarm environment. The
protocols are based on histograms that encode the local neighborhood relations
of the agents and can also transmit task-specific information, such as the
shortest distance and direction to a desired target. In our framework, we use
an adaptation of Trust Region Policy Optimization to learn complex
collaborative tasks, such as formation building and building a communication
link. We evaluate our findings in a simulated 2D-physics environment, and
compare the implications of different communication protocols.Comment: 13 pages, 4 figures, version 2, accepted at ANTS 201
Inverse Reinforcement Learning in Swarm Systems
Inverse reinforcement learning (IRL) has become a useful tool for learning
behavioral models from demonstration data. However, IRL remains mostly
unexplored for multi-agent systems. In this paper, we show how the principle of
IRL can be extended to homogeneous large-scale problems, inspired by the
collective swarming behavior of natural systems. In particular, we make the
following contributions to the field: 1) We introduce the swarMDP framework, a
sub-class of decentralized partially observable Markov decision processes
endowed with a swarm characterization. 2) Exploiting the inherent homogeneity
of this framework, we reduce the resulting multi-agent IRL problem to a
single-agent one by proving that the agent-specific value functions in this
model coincide. 3) To solve the corresponding control problem, we propose a
novel heterogeneous learning scheme that is particularly tailored to the swarm
setting. Results on two example systems demonstrate that our framework is able
to produce meaningful local reward models from which we can replicate the
observed global system dynamics.Comment: 9 pages, 8 figures; ### Version 2 ### version accepted at AAMAS 201
- …