54 research outputs found

    Game-theoretical control with continuous action sets

    Full text link
    Motivated by the recent applications of game-theoretical learning techniques to the design of distributed control systems, we study a class of control problems that can be formulated as potential games with continuous action sets, and we propose an actor-critic reinforcement learning algorithm that provably converges to equilibrium in this class of problems. The method employed is to analyse the learning process under study through a mean-field dynamical system that evolves in an infinite-dimensional function space (the space of probability distributions over the players' continuous controls). To do so, we extend the theory of finite-dimensional two-timescale stochastic approximation to an infinite-dimensional, Banach space setting, and we prove that the continuous dynamics of the process converge to equilibrium in the case of potential games. These results combine to give a provably-convergent learning algorithm in which players do not need to keep track of the controls selected by the other agents.Comment: 19 page

    COllective INtelligence with task assignment

    Get PDF
    In this paper we study the COllective INtelligence (COIN) framework of Wolpert et al. for dispersion games (Grenager, Powers and Shoham, 2002) and variants of the EL Farol Bar problem. These settings constitute difficult MAS problems where fine-grained coordination between the agents is required. We enhance the COIN framework to dramatically improve convergence results for MAS with a large number of agents. The increased convergence properties for the dispersion games are competitive with especially tailored strategies for solving dispersion games. The enhancements to the COIN framework proved to be essential to solve the more complex variants of the El Farol Bar-like problem

    Multi Agent Reward Analysis for Learning in Noisy Domains

    Get PDF
    In many multi agent learning problems, it is difficult to determine, a priori, the agent reward structure that will lead to good performance. This problem is particularly pronounced in continuous, noisy domains ill-suited to simple table backup schemes commonly used in TD(lambda)/Q-learning. In this paper, we present a new reward evaluation method that allows the tradeoff between coordination among the agents and the difficulty of the learning problem each agent faces to be visualized. This method is independent of the learning algorithm and is only a function of the problem domain and the agents reward structure. We then use this reward efficiency visualization method to determine an effective reward without performing extensive simulations. We test this method in both a static and a dynamic multi-rover learning domain where the agents have continuous state spaces and where their actions are noisy (e.g., the agents movement decisions are not always carried out properly). Our results show that in the more difficult dynamic domain, the reward efficiency visualization method provides a two order of magnitude speedup in selecting a good reward. Most importantly it allows one to quickly create and verify rewards tailored to the observational limitations of the domain

    A Scalable and Robust Multi-Agent Approach to Distributed Optimization

    Get PDF
    Modularizing a large optimization problem so that the solutions to the subproblems provide a good overall solution is a challenging problem. In this paper we present a multi-agent approach to this problem based on aligning the agent objectives with the system objectives, obviating the need to impose external mechanisms to achieve collaboration among the agents. This approach naturally addresses scaling and robustness issues by ensuring that the agents do not rely on the reliable operation of other agents We test this approach in the difficult distributed optimization problem of imperfect device subset selection [Challet and Johnson, 2002]. In this problem, there are n devices, each of which has a "distortion", and the task is to find the subset of those n devices that minimizes the average distortion. Our results show that in large systems (1000 agents) the proposed approach provides improvements of over an order of magnitude over both traditional optimization methods and traditional multi-agent methods. Furthermore, the results show that even in extreme cases of agent failures (i.e., half the agents fail midway through the simulation) the system remains coordinated and still outperforms a failure-free and centralized optimization algorithm
    • …
    corecore