49 research outputs found
A visual demonstration of convergence properties of cooperative coevolution
We introduce a model for cooperative coevolutionary algorithms (CCEAs) using partial mixing, which allows us to compute the expected long-run convergence of such algorithms when individuals ’ fitness is based on the maximum payoff of some N evaluations with partners chosen at random from the other population. Using this model, we devise novel visualization mechanisms to attempt to qualitatively explain a difficult-to-conceptualize pathology in CCEAs: the tendency for them to converge to suboptimal Nash equilibria. We further demonstrate visually how increasing the size of N, or biasing the fitness to include an ideal-collaboration factor, both improve the likelihood of optimal convergence, and under which initial population configurations they are not much help
Cooperative coevolution of control for a real multirobot system
The potential of cooperative coevolutionary algorithms (CCEAs) as a tool for evolving control for heterogeneous multirobot teams has been shown in several previous works. The vast majority of these works have, however, been confined to simulation-based experiments. In this paper, we present one of the first demonstrations of a real multirobot system, operating outside laboratory conditions, with controllers synthesised by CCEAs. We evolve control for an aquatic multirobot system that has to perform a cooperative predator-prey pursuit task. The evolved controllers are transferred to real hardware, and their performance is assessed in a non-controlled outdoor environment. Two approaches are used to evolve control: a standard fitness-driven CCEA, and novelty-driven coevolution. We find that both approaches are able to evolve teams that transfer successfully to the real robots. Novelty-driven coevolution is able to evolve a broad range of successful team behaviours, which we test on the real multirobot system.info:eu-repo/semantics/acceptedVersio
Novelty-driven cooperative coevolution
Cooperative coevolutionary algorithms (CCEAs) rely on multiple coevolving populations for the evolution of solutions composed of coadapted components. CCEAs enable, for instance, the evolution of cooperative multiagent systems composed of heterogeneous agents, where each agent is modelled as a component of the solution. Previous works have, however, shown that CCEAs are biased toward stability: the evolutionary process tends to converge prematurely to stable states instead of (near-)optimal solutions. In this study, we show how novelty search can be used to avoid the counterproductive attraction to stable states in coevolution. Novelty search is an evolutionary technique that drives evolution toward behavioural novelty and diversity rather than exclusively pursuing a static objective. We evaluate three novelty-based approaches that rely on, respectively (1) the novelty of the team as a whole, (2) the novelty of the agents’ individual behaviour, and (3) the combination of the two. We compare the proposed approaches with traditional fitness-driven cooperative coevolution in three simulated multirobot tasks. Our results show that team-level novelty scoring is the most effective approach, significantly outperforming fitness-driven coevolution at multiple levels. Novelty-driven cooperative coevolution can substantially increase the potential of CCEAs while maintaining a computational complexity that scales well with the number of populations.info:eu-repo/semantics/publishedVersio
Recommended from our members
Theoretical and implementation improvements for difference evaluation functions
Multiagent learning with cooperative coevolutionary algorithms is a critical area of research, and is relevant to many real-world applications including air traffic control, distributed sensor network control, and game-theoretic applications such as border patrol. A key difficulty in multiagent learning is the credit assignment problem, where the impact of each individual agent on the overall system performance must be ascertained. Difference evaluation functions aim to solve this credit assignment problem, by approximating the effect that each agent has on the system evaluation function. Difference evaluations have proven to produce superior learned policies in many multiagent settings.
Although difference evaluations have produced excellent empirical results, there are still three key research questions that must be addressed regarding their usefulness in real-world systems. More specifically, the performance, theoretical advantages, and methodology for implementation must be addressed in order to demonstrate that difference evaluations are practical for use in real-world multiagent learning. These research questions are addressed in this dissertation. The first contribution of this dissertation is to demonstrate that difference evaluations may be extended and combined with other coordination mechanisms, resulting in superior learned performance. The second contribution of this dissertation is to derive conditions which guarantee that difference evaluations will outperform traditional coordination mechanisms. The third and final contribution of this dissertation is to demonstrate that difference evaluations may be approximated using only local knowledge, allowing for their implementation in any generic multiagent learning setting. By addressing the performance, theoretical foundation, and implementation concerns of difference evaluations, this dissertation provides a detailed analysis demonstrating the usefulness of difference evaluation functions in multiagent learning systems
Cooperative coevolution of morphologically heterogeneous robots
Morphologically heterogeneous multirobot teams have
shown significant potential in many applications. While cooperative coevolutionary algorithms can be used for synthesising controllers for heterogeneous multirobot systems, they
have been almost exclusively applied to morphologically homogeneous systems. In this paper, we investigate if and
how cooperative coevolutionary algorithms can be used to
evolve behavioural control for a morphologically heterogeneous multirobot system. Our experiments rely on a simulated task, where a ground robot with a simple sensor-actuator
configuration must cooperate tightly with a more complex
aerial robot to find and collect items in the environment. We
first show how differences in the number and complexity of
skills each robot has to learn can impair the effectiveness of
cooperative coevolution. We then show how coevolution’s
effectiveness can be improved using incremental evolution or
novelty-driven coevolution. Despite its limitations, we show
that coevolution is a viable approach for synthesising control
for morphologically heterogeneous systems.info:eu-repo/semantics/publishedVersio
Multiagent Learning Through Indirect Encoding
Designing a system of multiple, heterogeneous agents that cooperate to achieve a common goal is a difficult task, but it is also a common real-world problem. Multiagent learning addresses this problem by training the team to cooperate through a learning algorithm. However, most traditional approaches treat multiagent learning as a combination of multiple single-agent learning problems. This perspective leads to many inefficiencies in learning such as the problem of reinvention, whereby fundamental skills and policies that all agents should possess must be rediscovered independently for each team member. For example, in soccer, all the players know how to pass and kick the ball, but a traditional algorithm has no way to share such vital information because it has no way to relate the policies of agents to each other. In this dissertation a new approach to multiagent learning that seeks to address these issues is presented. This approach, called multiagent HyperNEAT, represents teams as a pattern of policies rather than individual agents. The main idea is that an agent’s location within a canonical team layout (such as a soccer team at the start of a game) tends to dictate its role within that team, called the policy geometry. For example, as soccer positions move from goal to center they become more offensive and less defensive, a concept that is compactly represented as a pattern. iii The first major contribution of this dissertation is a new method for evolving neural network controllers called HyperNEAT, which forms the foundation of the second contribution and primary focus of this work, multiagent HyperNEAT. Multiagent learning in this dissertation is investigated in predator-prey, room-clearing, and patrol domains, providing a real-world context for the approach. Interestingly, because the teams in multiagent HyperNEAT are represented as patterns they can scale up to an infinite number of multiagent policies that can be sampled from the policy geometry as needed. Thus the third contribution is a method for teams trained with multiagent HyperNEAT to dynamically scale their size without further learning. Fourth, the capabilities to both learn and scale in multiagent HyperNEAT are compared to the traditional multiagent SARSA(λ) approach in a comprehensive study. The fifth contribution is a method for efficiently learning and encoding multiple policies for each agent on a team to facilitate learning in multi-task domains. Finally, because there is significant interest in practical applications of multiagent learning, multiagent HyperNEAT is tested in a real-world military patrolling application with actual Khepera III robots. The ultimate goal is to provide a new perspective on multiagent learning and to demonstrate the practical benefits of training heterogeneous, scalable multiagent teams through generative encoding
Evolving team compositions by agent swapping
Optimizing collective behavior in multiagent systems requires algorithms to find not only appropriate individual behaviors but also a suitable composition of agents within a team. Over the last two decades, evolutionary methods have emerged as a promising approach for the design of agents and their compositions into teams. The choice of a crossover operator that facilitates the evolution of optimal team composition is recognized to be crucial, but so far, it has never been thoroughly quantified. Here, we highlight the limitations of two different crossover operators that exchange entire agents between teams: restricted agent swapping (RAS) that exchanges only corresponding agents between teams and free agent swapping (FAS) that allows an arbitrary exchange of agents. Our results show that RAS suffers from premature convergence, whereas FAS entails insufficient convergence. Consequently, in both cases, the exploration and exploitation aspects of the evolutionary algorithm are not well balanced resulting in the evolution of suboptimal team compositions. To overcome this problem, we propose combining the two methods. Our approach first applies FAS to explore the search space and then RAS to exploit it. This mixed approach is a much more efficient strategy for the evolution of team compositions compared to either strategy on its own. Our results suggest that such a mixed agent-swapping algorithm should always be preferred whenever the optimal composition of individuals in a multiagent system is unknown
Recommended from our members
Design of Complex Engineered Systems Using Multiagent Coordination
This thesis is the combination of two research publications working toward a unified strategy in which the design of complex engineered systems can be completed using a multiagent coordination approach. Current engineered system modeling techniques segment large complex models into multiple groups to be simulated independently. These methods restrict the evaluations of such complex systems, as their failure properties are typically unknown until they are experienced in operation. In an effort to help engineers to design complex engineered systems, this research proposes that a distributed yet non-legislated approach can be used in the design processes by splitting up the overall system into specific teams. The approach specifically hypothesizes that multiagent credit assignment can be used to effectively determine how to properly incentivize subsystem designers so that the global set of system-level objectives can be achieved.
The first publication presents a multiagent systems based approach for designing a self-replicating robotic manufacturing factory in space. The simulation in this work is able to present the coordination of the agents during the construction of the factory as the parameters of the learning algorithm are changed. The results show the advantage of using a learning algorithm to design a large system. The second publication presents a hybrid approach to design complex engineered systems, providing a method in which design decisions can be reconciled without the need for either detailed interaction models or external legislating mechanisms. The results of this paper demonstrate that a team of autonomous agents using a cooperative coevolutionary algorithm can effectively design a complex engineered system.
Each publication utilized a system model to illustrate and simulate the methods and potential results. By designing complex systems with a multiagent coordination approach, a design methodology can be developed in an effort to reduce design uncertainty and provide mechanisms through which the system level impact of decisions can be estimated without explicitly modeling such interactions
Multiagent Q-learning with Sub-Team Coordination
In many real-world cooperative multiagent reinforcement learning (MARL) tasks, teams of agents can rehearse together before deployment, but then communication constraints may force individual agents to execute independently when deployed. Centralized training and decentralized execution (CTDE) is increasingly popular in recent years, focusing mainly on this setting. In the value-based MARL branch, credit assignment mechanism is typically used to factorize the team reward into each individual's reward - individual-global-max (IGM) is a condition on the factorization ensuring that agents' action choices coincide with team's optimal joint action. However, current architectures fail to consider local coordination within sub-teams that should be exploited for more effective factorization, leading to faster learning. We propose a novel value factorization framework, called multiagent Q-learning with sub-team coordination (QSCAN), to flexibly represent sub-team coordination while honoring the IGM condition. QSCAN encompasses the full spectrum of sub-team coordination according to sub-team size, ranging from the monotonic value function class to the entire IGM function class, with familiar methods such as QMIX and QPLEX located at the respective extremes of the spectrum. Experimental results show that QSCAN's performance dominates state-of-the-art methods in matrix games, predator-prey tasks, the Switch challenge in MA-Gym. Additionally, QSCAN achieves comparable performances to those methods in a selection of StarCraft II micro-management tasks