54 research outputs found
Game-theoretical control with continuous action sets
Motivated by the recent applications of game-theoretical learning techniques
to the design of distributed control systems, we study a class of control
problems that can be formulated as potential games with continuous action sets,
and we propose an actor-critic reinforcement learning algorithm that provably
converges to equilibrium in this class of problems. The method employed is to
analyse the learning process under study through a mean-field dynamical system
that evolves in an infinite-dimensional function space (the space of
probability distributions over the players' continuous controls). To do so, we
extend the theory of finite-dimensional two-timescale stochastic approximation
to an infinite-dimensional, Banach space setting, and we prove that the
continuous dynamics of the process converge to equilibrium in the case of
potential games. These results combine to give a provably-convergent learning
algorithm in which players do not need to keep track of the controls selected
by the other agents.Comment: 19 page
COllective INtelligence with task assignment
In this paper we study the COllective INtelligence (COIN) framework of Wolpert et al. for dispersion games (Grenager, Powers and Shoham, 2002) and variants of the EL Farol Bar problem. These settings constitute difficult MAS problems where fine-grained coordination between the agents is required. We enhance the COIN framework to dramatically improve convergence results for MAS with a large number of agents. The increased convergence properties for the dispersion games are competitive with especially tailored strategies for solving dispersion games. The enhancements to the COIN framework proved to be essential to solve the more complex variants of the El Farol Bar-like problem
Multi Agent Reward Analysis for Learning in Noisy Domains
In many multi agent learning problems, it is difficult to determine, a priori, the agent reward structure that will lead to good performance. This problem is particularly pronounced in continuous, noisy domains ill-suited to simple table backup schemes commonly used in TD(lambda)/Q-learning. In this paper, we present a new reward evaluation method that allows the tradeoff between coordination among the agents and the difficulty of the learning problem each agent faces to be visualized. This method is independent of the learning algorithm and is only a function of the problem domain and the agents reward structure. We then use this reward efficiency visualization method to determine an effective reward without performing extensive simulations. We test this method in both a static and a dynamic multi-rover learning domain where the agents have continuous state spaces and where their actions are noisy (e.g., the agents movement decisions are not always carried out properly). Our results show that in the more difficult dynamic domain, the reward efficiency visualization method provides a two order of magnitude speedup in selecting a good reward. Most importantly it allows one to quickly create and verify rewards tailored to the observational limitations of the domain
A Scalable and Robust Multi-Agent Approach to Distributed Optimization
Modularizing a large optimization problem so that the solutions to the subproblems provide a good overall solution is a challenging problem. In this paper we present a multi-agent approach to this problem based on aligning the agent objectives with the system objectives, obviating the need to impose external mechanisms to achieve collaboration among the agents. This approach naturally addresses scaling and robustness issues by ensuring that the agents do not rely on the reliable operation of other agents We test this approach in the difficult distributed optimization problem of imperfect device subset selection [Challet and Johnson, 2002]. In this problem, there are n devices, each of which has a "distortion", and the task is to find the subset of those n devices that minimizes the average distortion. Our results show that in large systems (1000 agents) the proposed approach provides improvements of over an order of magnitude over both traditional optimization methods and traditional multi-agent methods. Furthermore, the results show that even in extreme cases of agent failures (i.e., half the agents fail midway through the simulation) the system remains coordinated and still outperforms a failure-free and centralized optimization algorithm
- …