9,653 research outputs found

    Deep Reinforcement Learning for Swarm Systems

    Full text link
    Recently, deep reinforcement learning (RL) methods have been applied successfully to multi-agent scenarios. Typically, these methods rely on a concatenation of agent states to represent the information content required for decentralized decision making. However, concatenation scales poorly to swarm systems with a large number of homogeneous agents as it does not exploit the fundamental properties inherent to these systems: (i) the agents in the swarm are interchangeable and (ii) the exact number of agents in the swarm is irrelevant. Therefore, we propose a new state representation for deep multi-agent RL based on mean embeddings of distributions. We treat the agents as samples of a distribution and use the empirical mean embedding as input for a decentralized policy. We define different feature spaces of the mean embedding using histograms, radial basis functions and a neural network learned end-to-end. We evaluate the representation on two well known problems from the swarm literature (rendezvous and pursuit evasion), in a globally and locally observable setup. For the local setup we furthermore introduce simple communication protocols. Of all approaches, the mean embedding representation using neural network features enables the richest information exchange between neighboring agents facilitating the development of more complex collective strategies.Comment: 31 pages, 12 figures, version 3 (published in JMLR Volume 20

    Non-equilibrium statistical mechanics of Minority Games

    Full text link
    In this paper I give a brief introduction to a family of simple but non-trivial models designed to increase our understanding of collective processes in markets, the so-called Minority Games, and their non-equilibrium statistical mathematical analysis. Since the most commonly studied members of this family define disordered stochastic processes without detailed balance, the canonical technique for finding exact solutions is found to be generating functional analysis a la De Dominicis, as originally developed in the spin-glass community.Comment: 14 pages, short review for Cergy 2002 conference proceeding

    Promotion of cooperation induced by nonlinear attractive effect in spatial Prisoner's Dilemma game

    Full text link
    We introduce nonlinear attractive effects into a spatial Prisoner's Dilemma game where the players located on a square lattice can either cooperate with their nearest neighbors or defect. In every generation, each player updates its strategy by firstly choosing one of the neighbors with a probability proportional to Aα\mathcal{A}^\alpha denoting the attractiveness of the neighbor, where A\mathcal{A} is the payoff collected by it and α\alpha (\geq0) is a free parameter characterizing the extent of the nonlinear effect; and then adopting its strategy with a probability dependent on their payoff difference. Using Monte Carlo simulations, we investigate the density ρC\rho_C of cooperators in the stationary state for different values of α\alpha. It is shown that the introduction of such attractive effect remarkably promotes the emergence and persistence of cooperation over a wide range of the temptation to defect. In particular, for large values of α\alpha, i.e., strong nonlinear attractive effects, the system exhibits two absorbing states (all cooperators or all defectors) separated by an active state (coexistence of cooperators and defectors) when varying the temptation to defect. In the critical region where ρC\rho_C goes to zero, the extinction behavior is power law-like ρC\rho_C \sim (bcb)β(b_c-b)^{\beta}, where the exponent β\beta accords approximatively with the critical exponent (β0.584\beta\approx0.584) of the two-dimensional directed percolation and depends weakly on the value of α\alpha.Comment: 7 pages, 4 figure

    Algorithms for Game Metrics

    Get PDF
    Simulation and bisimulation metrics for stochastic systems provide a quantitative generalization of the classical simulation and bisimulation relations. These metrics capture the similarity of states with respect to quantitative specifications written in the quantitative {\mu}-calculus and related probabilistic logics. We first show that the metrics provide a bound for the difference in long-run average and discounted average behavior across states, indicating that the metrics can be used both in system verification, and in performance evaluation. For turn-based games and MDPs, we provide a polynomial-time algorithm for the computation of the one-step metric distance between states. The algorithm is based on linear programming; it improves on the previous known exponential-time algorithm based on a reduction to the theory of reals. We then present PSPACE algorithms for both the decision problem and the problem of approximating the metric distance between two states, matching the best known algorithms for Markov chains. For the bisimulation kernel of the metric our algorithm works in time O(n^4) for both turn-based games and MDPs; improving the previously best known O(n^9\cdot log(n)) time algorithm for MDPs. For a concurrent game G, we show that computing the exact distance between states is at least as hard as computing the value of concurrent reachability games and the square-root-sum problem in computational geometry. We show that checking whether the metric distance is bounded by a rational r, can be done via a reduction to the theory of real closed fields, involving a formula with three quantifier alternations, yielding O(|G|^O(|G|^5)) time complexity, improving the previously known reduction, which yielded O(|G|^O(|G|^7)) time complexity. These algorithms can be iterated to approximate the metrics using binary search.Comment: 27 pages. Full version of the paper accepted at FSTTCS 200
    corecore