32 research outputs found
Novel approaches to cooperative coevolution of heterogeneous multiagent systems
Tese de doutoramento, Informática (Engenharia Informática), Universidade de Lisboa, Faculdade de Ciências, 2017Heterogeneous multirobot systems are characterised by the morphological and/or behavioural heterogeneity of their constituent robots. These systems have a number of advantages over the more common homogeneous multirobot systems: they can leverage specialisation for increased efficiency, and they can solve tasks that are beyond the reach of any single type of robot, by combining the capabilities of different robots. Manually designing control for heterogeneous systems is a challenging endeavour, since the desired system behaviour has to be decomposed into behavioural rules for the individual robots, in such a way that the team as a whole cooperates and takes advantage of specialisation. Evolutionary robotics is a promising alternative that can be used to automate the synthesis of controllers for multirobot systems, but so far, research in the field has been mostly focused on homogeneous systems, such as swarm robotics systems. Cooperative coevolutionary algorithms (CCEAs) are a type of evolutionary algorithm that facilitate the evolution of control for heterogeneous systems, by working over a decomposition of the problem. In a typical CCEA application, each agent evolves in a separate population, with the evaluation of each agent depending on the cooperation with agents from the other coevolving populations. A CCEA is thus capable of projecting the large search space into multiple smaller, and more manageable, search spaces. Unfortunately, the use of cooperative coevolutionary algorithms is associated with a number of challenges. Previous works have shown that CCEAs are not necessarily attracted to the global optimum, but often converge to mediocre stable states; they can be inefficient when applied to large teams; and they have not yet been demonstrated in real robotic systems, nor in morphologically heterogeneous multirobot systems. In this thesis, we propose novel methods for overcoming the fundamental challenges in cooperative coevolutionary algorithms mentioned above, and study them in multirobot domains: we propose novelty-driven cooperative coevolution, in which premature convergence is avoided by encouraging behavioural novelty; and we propose Hyb-CCEA, an extension of CCEAs that places the team heterogeneity under evolutionary control, significantly improving its scalability with respect to the team size. These two approaches have in common that they take into account the exploration of the behaviour space by the evolutionary process. Besides relying on the fitness function for the evaluation of the candidate solutions, the evolutionary process analyses the behaviour of the evolving agents to improve the effectiveness of the evolutionary search. The ultimate goal of our research is to achieve general methods that can effectively synthesise controllers for heterogeneous multirobot systems, and therefore help to realise the full potential of this type of systems. To this end, we demonstrate the proposed approaches in a variety of multirobot domains used in previous works, and we study the application of CCEAs to new robotics domains, including a morphological heterogeneous system and a real robotic system.Fundação para a Ciência e a Tecnologia (FCT, PEst-OE/EEI/LA0008/2011
Recommended from our members
Distortion of Agent States for Improved Coordination
Many real world problems have partial solutions or intermediary steps which can lead toward solving the problem. When assembling robotic teams to solve these problems, we have intuition about which intermediary steps are more useful than others. We examine methods to identify and apply our designer intuition onto tightly coupled multiagent problems. In a method analogous to potential based reward shaping, we shape the perceived value of points of interest (POI) in a ground-rover observation problem based on the potential for further team coordination if the observing agent goes to observe that POI. These state distortion methods utilize information from the current world, and as such are constructed independent of the learning method used. Methods for direct state shaping from Nasroullahi [1], which are based on future prediction about other agents’ actions, are extended into the future. From this initial work, we were inspired to create new methods that which readily scale to POI problems of arbitrary coupling dimension. These new methods show a no performance degradation in less coupled domains, and sustained operational capacity in more difficult, tightly coupled problems where traditional methods break down. This field of direct state distortion for increased cooperation and performance is relatively unexplored, and we finally lay out future directions of this area of work.
Key Words: multiagent learning, state-shaping methods, tightly coupled problem
Recommended from our members
Multiagent learning for locomotion and coordination in tensegrity robotics
Tensegrity structures are composed of pure compressional elements that are connected via a network of pure tensional elements. The concept of tensegrity promises numerous advantages to the field of robotics. Tensegrity robots are, however, notoriously difficult to control due to their oscillatory nature and nonlinear interaction between the components. Multiagent learning, a subtopic of artificial intelligence, provides the tools to address challenges of tensegrity robots. In multiagent learning, multiple entities simultaneously learn a task together while interacting with each other through the environment. This approach can be applied at two different levels: both to coordinate teams of multiple robots, and to control a single robot where different agents control different parts of the robot. In this work, we consider both cases, and apply two multiagent learning approaches (Reinforcement Learning and Evolutionary Algorithms) to tensegrity robotics problems at different levels. First, we take the model of an icosahedron robot, and use multiagent learning to control different parts. We use coevolutionary algorithms and fitness shaping to develop learning based robust rolling locomotion algorithm. After the locomotion aspect, we study multi-robot coordination using multiagent reinforcement learning and reward shaping methods. At this phase, we study reward shaping and develop methods to use reward shaping to improve the cooperation between multiple tensegrity robots. We explain how these results are simulated and validated by using physical tensegrity robots. Last, we explain how these results helped design and development of a tensegrity robot with rolling capability: SUPERBall
Recommended from our members
D₊₊ : Structural Credit Assignment in Tightly Coupled Multiagent Domains
Autonomous multiagent teams can be used in complex exploration tasks to both expedite the exploration and improve the efficiency. However, use of multiagent systems presents additional challenges. Specifically, in domains where the agents' actions are tightly coupled, coordinating multiple agents to achieve cooperative behavior at the group level is difficult. In this work, we demonstrate that reward shaping can greatly benefit learning in tightly coupled multiagent exploration tasks. We argue that in tightly coupled domains, effective coordination depends on rewarding stepping stone actions, actions that would improve system's objective but are not rewarded because other agents have not yet found their proper actions. To this end, we build upon the current work in multiagent structural credit assignment literature and we extend the idea of counterfactuals introduced in difference evaluation functions.
Difference evaluation functions have a number of properties that make them ideal as learning signal, such as sensitivity to agent's actions and alignment with the global system objective. However, they fail to tackle the coordination problem in domains where the agent coupling is tight. Extending the idea of counterfactuals, we propose a novel reward structure, D₊₊. We investigate the performance of the D₊₊ in two different multiagent domains. We show that while both global team performance and the difference evaluation function fail to properly reward the stepping stone actions, our proposed algorithm successfully rewards such behaviors and provides superior performance (166% performance improvement and a quadruple convergence speed up) compared to policies learned using either the global reward or the difference reward
Recommended from our members
Adaptive Multiagent Traffic Management for Autonomous Robotic Systems
There is growing commercial interest in the use of unmanned aerial vehicles (UAVs) in urban environments, specifically for package delivery applications. However, the size, complexity and sheer numbers of expected UAVs makes conventional air traffic management that relies on human air traffic controllers infeasible. To enable UAVs to safely and efficiently operate in congested environments, it is essential to develop autonomous UAV management strategies.
We introduce a dynamic hierarchical traffic control model that reacts to traffic conditions instantaneously to reduce congestion in the airspace. An obstacle-filled airspace lends itself to a modelling as a graph structure similar to a road network. We introduce controller agents, which set costs across the airspace. These agents control traffic similarly to adaptive metering lights in highway traffic. UAVs then plan their paths based on the costs (e.g. conflicts, or delays) they see for traversing particular parts of the airspace. This provides us a decentralized method for reducing traffic in an airspace
Our hierarchical structure allows us to separate the traffic reduction problem from the individual robot navigation problem. Each robot does not explicitly coordinate with others in the airspace. Instead, robots execute their own individual internal cost-based planner to travel between locations. We then use neuro-evolution to provide incentives to these cost-based planners to reduce traffic in the environment.
Traffic quality can be expressed in several different ways. We first evaluate traffic our traffic reduction policies in terms of `conflicts', which characterizes situations where an aircraft comes too close to another for safety in a physical space. We then examine traffic in terms of the amount of `delay' that all agents incur, which assumes that there is a structure to ensure only a safe number of UAVs occupy the same area. Finally, we look at the total travel time that a UAV can expect to take from the moment it enters the airspace until the time it gets to its destination.
To facilitate an exploration of the UTM problem without waiting for a full simulation of UAVS running with A* , we develop an abstraction of the UTM domain that preserves the core UTM problem. We then investigate performance under differing levels of traffic, a well as two different agent structures. Our results show similar performance for both agent definitions, with delay reduction of up to 68% in high traffic cases.
With a fast version of the UTM problem, we explore the effect of redefining the control structure such that links, or edges of the UTM graph, set costs individually. This shifts the control paradigm toward controlling directional travel rather than areas in the space, as was the case with sector agents used in previous approaches. Due to our graph structure, we find that there are far more control elements in the link agent approach than in the sector agent approach. We identify a tradeoff; link agents give finer control, but the coordination problem for the sector agents is easier because there are fewer sector agents. This indicates that we can improve performance out of a more distributed link-based setup if we address the challenges of multiagent coordination. However, the UAV traffic management domain presents a uniquely difficult coordination problem; each agent's action can affect the perceived value of every other agent's actions. This means that there is an excessive amount of noise in the system, as another agent's action can have a lot of impact on the reward an agent receives.
We reduce the amount of multiagent noise by reducing the number of agents that are capable of learning. We identify that some agents have more ability to influence traffic based on the topology and traffic profile of the graph. This metric we call impactfulness. We use this metric to improve the learning by removing less impactful agents from the learning process, making a more stationary system in which the impactful agents can learn.
The contributions of this work are to:
- Introduce a cost-based traffic management approach that is platform-agnostic and fast to implement.
- Develop a multiagent approach to setting costs in this traffic management system that is adaptive to traffic conditions and learns long-term effects of management decisions.
- Create an abstraction of UAV traffic that captures key physical attributes, creating a fast and flexible simulation method.
- Quantify agent contributions to system performance by experimenting with single agent learning, single agent exclusion, and a sliding number of agents learning in the system.Keywords: Planning, UAV, Multiagen
USING COEVOLUTION IN COMPLEX DOMAINS
Genetic Algorithms is a computational model inspired by Darwin's theory of evolution. It has a broad range of applications from function optimization to solving robotic control problems. Coevolution is an extension of Genetic Algorithms in which more than one population is evolved at the same time. Coevolution can be done in two ways: cooperatively, in which populations jointly try to solve an evolutionary problem, or competitively. Coevolution has been shown to be useful in solving many problems, yet its application in complex domains still needs to be demonstrated.Robotic soccer is a complex domain that has a dynamic and noisy environment. Many Reinforcement Learning techniques have been applied to the robotic soccer domain, since it is a great test bed for many machine learning methods. However, the success of Reinforcement Learning methods has been limited due to the huge state space of the domain. Evolutionary Algorithms have also been used to tackle this domain; nevertheless, their application has been limited to a small subset of the domain, and no attempt has been shown to be successful in acting on solving the whole problem.This thesis will try to answer the question of whether coevolution can be applied successfully to complex domains. Three techniques are introduced to tackle the robotic soccer problem. First, an incremental learning algorithm is used to achieve a desirable performance of some soccer tasks. Second, a hierarchical coevolution paradigm is introduced to allow coevolution to scale up in solving the problem. Third, an orchestration mechanism is utilized to manage the learning processes
Recommended from our members
Multi-Objective Optimization in Multiagent Systems
Cooperative multiagent systems are used as solution concepts in many application domains including air traffic control, satellite communications, and extra planetary exploration. As systems become more distributed and complex, we observe three phenomena. First, these systems cannot be accurately modeled, rendering traditional model based control methods inadequate. Second, system parameters are highly coupled in a nonlinear manner, making it difficult for humans to develop heuristic based control policies. Finally, these systems are distributed to the point that a centralized controller is either impractical or infeasible. These types of systems are often inherently multi-objective; unfortunately, they are not treated as such in most multiagent research. To date, there has been little research attention given to multi-objective multiagent systems.
This dissertation addresses these systems from a learning-based approach to optimize system performance in four ways: (i) deriving a form of credit assignment compatible for use with multi-objective problems (ii) deriving multiagent equivalents to state-of-the-art multi-objective evolutionary algorithms (MOEAs); (iii) developing a fast, effective multiagent multi-objective algorithm that outperforms state-of the art MOEAs in as little as one tenth of the computation time; and (iv) integrating the previously developed algorithm into a multiagent system
An exploration of evolutionary computation applied to frequency modulation audio synthesis parameter optimisation
With the ever-increasing complexity of sound synthesisers, there is a growing demand for automated parameter estimation and sound space navigation techniques. This thesis explores the potential for evolutionary computation to automatically map known sound qualities onto the parameters of frequency modulation synthesis. Within this exploration are original contributions in the domain of synthesis parameter estimation and, within the developed system, evolutionary computation, in the form of the evolutionary algorithms that drive the underlying optimisation process. Based upon the requirement for the parameter estimation system to deliver multiple search space solutions, existing evolutionary algorithmic architectures are augmented to enable niching, while maintaining the strengths of the original algorithms. Two novel evolutionary algorithms are proposed in which cluster analysis is used to identify and maintain species within the evolving populations. A conventional evolution strategy and cooperative coevolution strategy are defined, with cluster-orientated operators that enable the simultaneous optimisation of multiple search space solutions at distinct optima. A test methodology is developed that enables components of the synthesis matching problem to be identified and isolated, enabling the performance of different optimisation techniques to be compared quantitatively. A system is consequently developed that evolves sound matches using conventional frequency modulation synthesis models, and the effectiveness of different evolutionary algorithms is assessed and compared in application to both static and timevarying sound matching problems. Performance of the system is then evaluated by interview with expert listeners. The thesis is closed with a reflection on the algorithms and systems which have been developed, discussing possibilities for the future of automated synthesis parameter estimation techniques, and how they might be employed