1,744 research outputs found
Search and Pursuit-Evasion in Mobile Robotics, A survey
This paper surveys recent results in pursuitevasion
and autonomous search relevant to applications
in mobile robotics. We provide a taxonomy of search
problems that highlights the differences resulting from
varying assumptions on the searchers, targets, and the
environment. We then list a number of fundamental
results in the areas of pursuit-evasion and probabilistic
search, and we discuss field implementations on mobile
robotic systems. In addition, we highlight current open
problems in the area and explore avenues for future
work
Deep Reinforcement Learning for Swarm Systems
Recently, deep reinforcement learning (RL) methods have been applied
successfully to multi-agent scenarios. Typically, these methods rely on a
concatenation of agent states to represent the information content required for
decentralized decision making. However, concatenation scales poorly to swarm
systems with a large number of homogeneous agents as it does not exploit the
fundamental properties inherent to these systems: (i) the agents in the swarm
are interchangeable and (ii) the exact number of agents in the swarm is
irrelevant. Therefore, we propose a new state representation for deep
multi-agent RL based on mean embeddings of distributions. We treat the agents
as samples of a distribution and use the empirical mean embedding as input for
a decentralized policy. We define different feature spaces of the mean
embedding using histograms, radial basis functions and a neural network learned
end-to-end. We evaluate the representation on two well known problems from the
swarm literature (rendezvous and pursuit evasion), in a globally and locally
observable setup. For the local setup we furthermore introduce simple
communication protocols. Of all approaches, the mean embedding representation
using neural network features enables the richest information exchange between
neighboring agents facilitating the development of more complex collective
strategies.Comment: 31 pages, 12 figures, version 3 (published in JMLR Volume 20
Compact Representation of Value Function in Partially Observable Stochastic Games
Value methods for solving stochastic games with partial observability model
the uncertainty about states of the game as a probability distribution over
possible states. The dimension of this belief space is the number of states.
For many practical problems, for example in security, there are exponentially
many possible states which causes an insufficient scalability of algorithms for
real-world problems. To this end, we propose an abstraction technique that
addresses this issue of the curse of dimensionality by projecting
high-dimensional beliefs to characteristic vectors of significantly lower
dimension (e.g., marginal probabilities). Our two main contributions are (1)
novel compact representation of the uncertainty in partially observable
stochastic games and (2) novel algorithm based on this compact representation
that is based on existing state-of-the-art algorithms for solving stochastic
games with partial observability. Experimental evaluation confirms that the new
algorithm over the compact representation dramatically increases the
scalability compared to the state of the art
Path Planning Problems with Side Observations-When Colonels Play Hide-and-Seek
Resource allocation games such as the famous Colonel Blotto (CB) and
Hide-and-Seek (HS) games are often used to model a large variety of practical
problems, but only in their one-shot versions. Indeed, due to their extremely
large strategy space, it remains an open question how one can efficiently learn
in these games. In this work, we show that the online CB and HS games can be
cast as path planning problems with side-observations (SOPPP): at each stage, a
learner chooses a path on a directed acyclic graph and suffers the sum of
losses that are adversarially assigned to the corresponding edges; and she then
receives semi-bandit feedback with side-observations (i.e., she observes the
losses on the chosen edges plus some others). We propose a novel algorithm,
EXP3-OE, the first-of-its-kind with guaranteed efficient running time for SOPPP
without requiring any auxiliary oracle. We provide an expected-regret bound of
EXP3-OE in SOPPP matching the order of the best benchmark in the literature.
Moreover, we introduce additional assumptions on the observability model under
which we can further improve the regret bounds of EXP3-OE. We illustrate the
benefit of using EXP3-OE in SOPPP by applying it to the online CB and HS games.Comment: Previously, this work appeared as arXiv:1911.09023 which was
mistakenly submitted as a new article (has been submitted to be withdrawn).
This is a preprint of the work published in Proceedings of the 34th AAAI
Conference on Artificial Intelligence (AAAI
- …