9,653 research outputs found
Deep Reinforcement Learning for Swarm Systems
Recently, deep reinforcement learning (RL) methods have been applied
successfully to multi-agent scenarios. Typically, these methods rely on a
concatenation of agent states to represent the information content required for
decentralized decision making. However, concatenation scales poorly to swarm
systems with a large number of homogeneous agents as it does not exploit the
fundamental properties inherent to these systems: (i) the agents in the swarm
are interchangeable and (ii) the exact number of agents in the swarm is
irrelevant. Therefore, we propose a new state representation for deep
multi-agent RL based on mean embeddings of distributions. We treat the agents
as samples of a distribution and use the empirical mean embedding as input for
a decentralized policy. We define different feature spaces of the mean
embedding using histograms, radial basis functions and a neural network learned
end-to-end. We evaluate the representation on two well known problems from the
swarm literature (rendezvous and pursuit evasion), in a globally and locally
observable setup. For the local setup we furthermore introduce simple
communication protocols. Of all approaches, the mean embedding representation
using neural network features enables the richest information exchange between
neighboring agents facilitating the development of more complex collective
strategies.Comment: 31 pages, 12 figures, version 3 (published in JMLR Volume 20
Non-equilibrium statistical mechanics of Minority Games
In this paper I give a brief introduction to a family of simple but
non-trivial models designed to increase our understanding of collective
processes in markets, the so-called Minority Games, and their non-equilibrium
statistical mathematical analysis. Since the most commonly studied members of
this family define disordered stochastic processes without detailed balance,
the canonical technique for finding exact solutions is found to be generating
functional analysis a la De Dominicis, as originally developed in the
spin-glass community.Comment: 14 pages, short review for Cergy 2002 conference proceeding
Promotion of cooperation induced by nonlinear attractive effect in spatial Prisoner's Dilemma game
We introduce nonlinear attractive effects into a spatial Prisoner's Dilemma
game where the players located on a square lattice can either cooperate with
their nearest neighbors or defect. In every generation, each player updates its
strategy by firstly choosing one of the neighbors with a probability
proportional to denoting the attractiveness of the
neighbor, where is the payoff collected by it and
(0) is a free parameter characterizing the extent of the nonlinear
effect; and then adopting its strategy with a probability dependent on their
payoff difference. Using Monte Carlo simulations, we investigate the density
of cooperators in the stationary state for different values of
. It is shown that the introduction of such attractive effect
remarkably promotes the emergence and persistence of cooperation over a wide
range of the temptation to defect. In particular, for large values of ,
i.e., strong nonlinear attractive effects, the system exhibits two absorbing
states (all cooperators or all defectors) separated by an active state
(coexistence of cooperators and defectors) when varying the temptation to
defect. In the critical region where goes to zero, the extinction
behavior is power law-like , where the
exponent accords approximatively with the critical exponent
() of the two-dimensional directed percolation and depends
weakly on the value of .Comment: 7 pages, 4 figure
Algorithms for Game Metrics
Simulation and bisimulation metrics for stochastic systems provide a
quantitative generalization of the classical simulation and bisimulation
relations. These metrics capture the similarity of states with respect to
quantitative specifications written in the quantitative {\mu}-calculus and
related probabilistic logics. We first show that the metrics provide a bound
for the difference in long-run average and discounted average behavior across
states, indicating that the metrics can be used both in system verification,
and in performance evaluation. For turn-based games and MDPs, we provide a
polynomial-time algorithm for the computation of the one-step metric distance
between states. The algorithm is based on linear programming; it improves on
the previous known exponential-time algorithm based on a reduction to the
theory of reals. We then present PSPACE algorithms for both the decision
problem and the problem of approximating the metric distance between two
states, matching the best known algorithms for Markov chains. For the
bisimulation kernel of the metric our algorithm works in time O(n^4) for both
turn-based games and MDPs; improving the previously best known O(n^9\cdot
log(n)) time algorithm for MDPs. For a concurrent game G, we show that
computing the exact distance between states is at least as hard as computing
the value of concurrent reachability games and the square-root-sum problem in
computational geometry. We show that checking whether the metric distance is
bounded by a rational r, can be done via a reduction to the theory of real
closed fields, involving a formula with three quantifier alternations, yielding
O(|G|^O(|G|^5)) time complexity, improving the previously known reduction,
which yielded O(|G|^O(|G|^7)) time complexity. These algorithms can be iterated
to approximate the metrics using binary search.Comment: 27 pages. Full version of the paper accepted at FSTTCS 200
- …