4,554 research outputs found
Inverse Reinforcement Learning in Swarm Systems
Inverse reinforcement learning (IRL) has become a useful tool for learning
behavioral models from demonstration data. However, IRL remains mostly
unexplored for multi-agent systems. In this paper, we show how the principle of
IRL can be extended to homogeneous large-scale problems, inspired by the
collective swarming behavior of natural systems. In particular, we make the
following contributions to the field: 1) We introduce the swarMDP framework, a
sub-class of decentralized partially observable Markov decision processes
endowed with a swarm characterization. 2) Exploiting the inherent homogeneity
of this framework, we reduce the resulting multi-agent IRL problem to a
single-agent one by proving that the agent-specific value functions in this
model coincide. 3) To solve the corresponding control problem, we propose a
novel heterogeneous learning scheme that is particularly tailored to the swarm
setting. Results on two example systems demonstrate that our framework is able
to produce meaningful local reward models from which we can replicate the
observed global system dynamics.Comment: 9 pages, 8 figures; ### Version 2 ### version accepted at AAMAS 201
Guided Deep Reinforcement Learning for Swarm Systems
In this paper, we investigate how to learn to control a group of cooperative
agents with limited sensing capabilities such as robot swarms. The agents have
only very basic sensor capabilities, yet in a group they can accomplish
sophisticated tasks, such as distributed assembly or search and rescue tasks.
Learning a policy for a group of agents is difficult due to distributed partial
observability of the state. Here, we follow a guided approach where a critic
has central access to the global state during learning, which simplifies the
policy evaluation problem from a reinforcement learning point of view. For
example, we can get the positions of all robots of the swarm using a camera
image of a scene. This camera image is only available to the critic and not to
the control policies of the robots. We follow an actor-critic approach, where
the actors base their decisions only on locally sensed information. In
contrast, the critic is learned based on the true global state. Our algorithm
uses deep reinforcement learning to approximate both the Q-function and the
policy. The performance of the algorithm is evaluated on two tasks with simple
simulated 2D agents: 1) finding and maintaining a certain distance to each
others and 2) locating a target.Comment: 15 pages, 8 figures, accepted at the AAMAS 2017 Autonomous Robots and
Multirobot Systems (ARMS) Worksho
Local Communication Protocols for Learning Complex Swarm Behaviors with Deep Reinforcement Learning
Swarm systems constitute a challenging problem for reinforcement learning
(RL) as the algorithm needs to learn decentralized control policies that can
cope with limited local sensing and communication abilities of the agents.
While it is often difficult to directly define the behavior of the agents,
simple communication protocols can be defined more easily using prior knowledge
about the given task. In this paper, we propose a number of simple
communication protocols that can be exploited by deep reinforcement learning to
find decentralized control policies in a multi-robot swarm environment. The
protocols are based on histograms that encode the local neighborhood relations
of the agents and can also transmit task-specific information, such as the
shortest distance and direction to a desired target. In our framework, we use
an adaptation of Trust Region Policy Optimization to learn complex
collaborative tasks, such as formation building and building a communication
link. We evaluate our findings in a simulated 2D-physics environment, and
compare the implications of different communication protocols.Comment: 13 pages, 4 figures, version 2, accepted at ANTS 201
Optimizing collective fieldtaxis of swarming agents through reinforcement learning
Swarming of animal groups enthralls scientists in fields ranging from biology
to physics to engineering. Complex swarming patterns often arise from simple
interactions between individuals to the benefit of the collective whole. The
existence and success of swarming, however, nontrivially depend on microscopic
parameters governing the interactions. Here we show that a machine-learning
technique can be employed to tune these underlying parameters and optimize the
resulting performance. As a concrete example, we take an active matter model
inspired by schools of golden shiners, which collectively conduct phototaxis.
The problem of optimizing the phototaxis capability is then mapped to that of
maximizing benefits in a continuum-armed bandit game. The latter problem
accepts a simple reinforcement-learning algorithm, which can tune the
continuous parameters of the model. This result suggests the utility of
machine-learning methodology in swarm-robotics applications.Comment: 6 pages, 3 figure
Deep Reinforcement Learning for Swarm Systems
Recently, deep reinforcement learning (RL) methods have been applied
successfully to multi-agent scenarios. Typically, these methods rely on a
concatenation of agent states to represent the information content required for
decentralized decision making. However, concatenation scales poorly to swarm
systems with a large number of homogeneous agents as it does not exploit the
fundamental properties inherent to these systems: (i) the agents in the swarm
are interchangeable and (ii) the exact number of agents in the swarm is
irrelevant. Therefore, we propose a new state representation for deep
multi-agent RL based on mean embeddings of distributions. We treat the agents
as samples of a distribution and use the empirical mean embedding as input for
a decentralized policy. We define different feature spaces of the mean
embedding using histograms, radial basis functions and a neural network learned
end-to-end. We evaluate the representation on two well known problems from the
swarm literature (rendezvous and pursuit evasion), in a globally and locally
observable setup. For the local setup we furthermore introduce simple
communication protocols. Of all approaches, the mean embedding representation
using neural network features enables the richest information exchange between
neighboring agents facilitating the development of more complex collective
strategies.Comment: 31 pages, 12 figures, version 3 (published in JMLR Volume 20
A Review on the Application of Natural Computing in Environmental Informatics
Natural computing offers new opportunities to understand, model and analyze
the complexity of the physical and human-created environment. This paper
examines the application of natural computing in environmental informatics, by
investigating related work in this research field. Various nature-inspired
techniques are presented, which have been employed to solve different relevant
problems. Advantages and disadvantages of these techniques are discussed,
together with analysis of how natural computing is generally used in
environmental research.Comment: Proc. of EnviroInfo 201
Learning Opposites Using Neural Networks
Many research works have successfully extended algorithms such as
evolutionary algorithms, reinforcement agents and neural networks using
"opposition-based learning" (OBL). Two types of the "opposites" have been
defined in the literature, namely \textit{type-I} and \textit{type-II}. The
former are linear in nature and applicable to the variable space, hence easy to
calculate. On the other hand, type-II opposites capture the "oppositeness" in
the output space. In fact, type-I opposites are considered a special case of
type-II opposites where inputs and outputs have a linear relationship. However,
in many real-world problems, inputs and outputs do in fact exhibit a nonlinear
relationship. Therefore, type-II opposites are expected to be better in
capturing the sense of "opposition" in terms of the input-output relation. In
the absence of any knowledge about the problem at hand, there seems to be no
intuitive way to calculate the type-II opposites. In this paper, we introduce
an approach to learn type-II opposites from the given inputs and their outputs
using the artificial neural networks (ANNs). We first perform \emph{opposition
mining} on the sample data, and then use the mined data to learn the
relationship between input and its opposite . We have validated
our algorithm using various benchmark functions to compare it against an
evolving fuzzy inference approach that has been recently introduced. The
results show the better performance of a neural approach to learn the
opposites. This will create new possibilities for integrating oppositional
schemes within existing algorithms promising a potential increase in
convergence speed and/or accuracy.Comment: To appear in proceedings of the 23rd International Conference on
Pattern Recognition (ICPR 2016), Cancun, Mexico, December 201
- …