411 research outputs found

    Predator-prey survival pressure is sufficient to evolve swarming behaviors

    Full text link
    The comprehension of how local interactions arise in global collective behavior is of utmost importance in both biological and physical research. Traditional agent-based models often rely on static rules that fail to capture the dynamic strategies of the biological world. Reinforcement learning has been proposed as a solution, but most previous methods adopt handcrafted reward functions that implicitly or explicitly encourage the emergence of swarming behaviors. In this study, we propose a minimal predator-prey coevolution framework based on mixed cooperative-competitive multiagent reinforcement learning, and adopt a reward function that is solely based on the fundamental survival pressure, that is, prey receive a reward of 1-1 if caught by predators while predators receive a reward of +1+1. Surprisingly, our analysis of this approach reveals an unexpectedly rich diversity of emergent behaviors for both prey and predators, including flocking and swirling behaviors for prey, as well as dispersion tactics, confusion, and marginal predation phenomena for predators. Overall, our study provides novel insights into the collective behavior of organisms and highlights the potential applications in swarm robotics

    Reinforcement Learning Agents acquire Flocking and Symbiotic Behaviour in Simulated Ecosystems

    Get PDF
    In nature, group behaviours such as flocking as well as cross-species symbiotic partnerships are observed in vastly different forms and circumstances. We hypothesize that such strategies can arise in response to generic predator-prey pressures in a spatial environment with range-limited sensation and action. We evaluate whether these forms of coordination can emerge by independent multi-agent reinforcement learning in simple multiple-species ecosystems. In contrast to prior work, we avoid hand-crafted shaping rewards, specific actions, or dynamics that would directly encourage coordination across agents. Instead we test whether coordination emerges as a consequence of adaptation without encouraging these specific forms of coordination, which only has indirect benefit. Our simulated ecosystems consist of a generic food chain involving three trophic levels: apex predator, mid-level predator, and prey. We conduct experiments on two different platforms, a 3D physics engine with tens of agents as well as in a 2D grid world with up to thousands. The results clearly confirm our hypothesis and show substantial coordination both within and across species. To obtain these results, we leverage and adapt recent advances in deep reinforcement learning within an ecosystem training protocol featuring homogeneous groups of independent agents from different species (sets of policies), acting in many different random combinations in parallel habitats. The policies utilize neural network architectures that are invariant to agent individuality but not type (species) and that generalize across varying numbers of observed other agents. While the emergence of complexity in artificial ecosystems have long been studied in the artificial life community, the focus has been more on individual complexity and genetic algorithms or explicit modelling, and less on group complexity and reinforcement learning emphasized in this article. Unlike what the name and intuition suggests, reinforcement learning adapts over evolutionary history rather than a life-time and is here addressing the sequential optimization of fitness that is usually approached by genetic algorithms in the artificial life community. We utilize a shift from procedures to objectives, allowing us to bring new powerful machinery to bare, and we see emergence of complex behaviour from a sequence of simple optimization problems

    Imitation Learning for Swarm Control using Variational Inference

    Get PDF
    Swarms are groups of robots that can coordinate, cooperate, and communicate to achieve tasks that may be impossible for a single robot. These systems exhibit complex dynamical behavior, similar to those observed in physics, neuroscience, finance, biology, social and communication networks, etc. For instance, in Biology, schools of fish, swarm of bacteria, colony of termites exhibit flocking behavior to achieve simple and complex tasks. Modeling the dynamics of flocking in animals is challenging as we usually do not have full knowledge of the dynamics of the system and how individual agent interact. The environment of swarms is also very noisy and chaotic. We usually only can observe the individual trajectories of the agents. This work presents a technique to learn how to discover and understand the underlying governing dynamics of these systems and how they interact from observation data alone using variational inference in an unsupervised manner. This is done by modeling the observed system dynamics as graphs and reconstructing the dynamics using variational autoencoders through multiple message passing operations in the encoder and decoder. By achieving this, we can apply our understanding of the complex behavior of swarm of animals to robotic systems to imitate flocking behavior of animals and perform decentralized control of robotic swarms. The approach relies on data-driven model discovery to learn local decentralized controllers that mimic the motion constraints and policies of animal flocks. To verify and validate this technique, experiments were done on observations from schools of fish and synthetic data from boids model

    Solving the potential field local minimum problem using internal agent states

    Get PDF
    We propose a new, extended artificial potential field method, which uses dynamic internal agent states. The internal states are modelled as a dynamical system of coupled first order differential equations that manipulate the potential field in which the agent is situated. The internal state dynamics are forced by the interaction of the agent with the external environment. Local equilibria in the potential field are then manipulated by the internal states and transformed from stable equilibria to unstable equilibria, allowiong escape from local minima in the potential field. This new methodology successfully solves reactive path planning problems, such as a complex maze with multiple local minima, which cannot be solved using conventional static potential fields

    Emergence and resilience in multi-agent reinforcement learning

    Get PDF
    Our world represents an enormous multi-agent system (MAS), consisting of a plethora of agents that make decisions under uncertainty to achieve certain goals. The interaction of agents constantly affects our world in various ways, leading to the emergence of interesting phenomena like life forms and civilizations that can last for many years while withstanding various kinds of disturbances. Building artificial MAS that are able to adapt and survive similarly to natural MAS is a major goal in artificial intelligence as a wide range of potential real-world applications like autonomous driving, multi-robot warehouses, and cyber-physical production systems can be straightforwardly modeled as MAS. Multi-agent reinforcement learning (MARL) is a promising approach to build such systems which has achieved remarkable progress in recent years. However, state-of-the-art MARL commonly assumes very idealized conditions to optimize performance in best-case scenarios while neglecting further aspects that are relevant to the real world. In this thesis, we address emergence and resilience in MARL which are important aspects to build artificial MAS that adapt and survive as effectively as natural MAS do. We first focus on emergent cooperation from local interaction of self-interested agents and introduce a peer incentivization approach based on mutual acknowledgments. We then propose to exploit emergent phenomena to further improve coordination in large cooperative MAS via decentralized planning or hierarchical value function factorization. To maintain multi-agent coordination in the presence of partial changes similar to classic distributed systems, we present adversarial methods to improve and evaluate resilience in MARL. Finally, we briefly cover a selection of further topics that are relevant to advance MARL towards real-world applicability.Unsere Welt stellt ein riesiges Multiagentensystem (MAS) dar, welches aus einer Vielzahl von Agenten besteht, die unter Unsicherheit Entscheidungen treffen müssen, um bestimmte Ziele zu erreichen. Die Interaktion der Agenten beeinflusst unsere Welt stets auf unterschiedliche Art und Weise, wodurch interessante emergente Phänomene wie beispielsweise Lebensformen und Zivilisationen entstehen, die über viele Jahre Bestand haben und dabei unterschiedliche Arten von Störungen überwinden können. Die Entwicklung von künstlichen MAS, die ähnlich anpassungs- und überlebensfähig wie natürliche MAS sind, ist eines der Hauptziele in der künstlichen Intelligenz, da viele potentielle Anwendungen wie zum Beispiel das autonome Fahren, die multi-robotergesteuerte Verwaltung von Lagerhallen oder der Betrieb von cyber-phyischen Produktionssystemen, direkt als MAS formuliert werden können. Multi-Agent Reinforcement Learning (MARL) ist ein vielversprechender Ansatz, mit dem in den letzten Jahren bemerkenswerte Fortschritte erzielt wurden, um solche Systeme zu entwickeln. Allerdings geht der Stand der Forschung aktuell von sehr idealisierten Annahmen aus, um die Effektivität ausschließlich für Szenarien im besten Fall zu optimieren. Dabei werden weiterführende Aspekte, die für die echte Welt relevant sind, größtenteils außer Acht gelassen. In dieser Arbeit werden die Aspekte Emergenz und Resilienz in MARL betrachtet, welche wichtig für die Entwicklung von anpassungs- und überlebensfähigen künstlichen MAS sind. Es wird zunächst die Entstehung von emergenter Kooperation durch lokale Interaktion von selbstinteressierten Agenten untersucht. Dazu wird ein Ansatz zur Peer-Incentivierung vorgestellt, welcher auf gegenseitiger Anerkennung basiert. Anschließend werden Ansätze zur Nutzung emergenter Phänomene für die Koordinationsverbesserung in großen kooperativen MAS präsentiert, die dezentrale Planungsverfahren oder hierarchische Faktorisierung von Evaluationsfunktionen nutzen. Zur Aufrechterhaltung der Multiagentenkoordination bei partiellen Veränderungen, ähnlich wie in klassischen verteilten Systemen, werden Methoden des Adversarial Learning vorgestellt, um die Resilienz in MARL zu verbessern und zu evaluieren. Abschließend wird kurz eine Auswahl von weiteren Themen behandelt, die für die Einsatzfähigkeit von MARL in der echten Welt relevant sind

    Aquarium: A Comprehensive Framework for Exploring Predator-Prey Dynamics through Multi-Agent Reinforcement Learning Algorithms

    Full text link
    Recent advances in Multi-Agent Reinforcement Learning have prompted the modeling of intricate interactions between agents in simulated environments. In particular, the predator-prey dynamics have captured substantial interest and various simulations been tailored to unique requirements. To prevent further time-intensive developments, we introduce Aquarium, a comprehensive Multi-Agent Reinforcement Learning environment for predator-prey interaction, enabling the study of emergent behavior. Aquarium is open source and offers a seamless integration of the PettingZoo framework, allowing a quick start with proven algorithm implementations. It features physics-based agent movement on a two-dimensional, edge-wrapping plane. The agent-environment interaction (observations, actions, rewards) and the environment settings (agent speed, prey reproduction, predator starvation, and others) are fully customizable. Besides a resource-efficient visualization, Aquarium supports to record video files, providing a visual comprehension of agent behavior. To demonstrate the environment's capabilities, we conduct preliminary studies which use PPO to train multiple prey agents to evade a predator. In accordance to the literature, we find Individual Learning to result in worse performance than Parameter Sharing, which significantly improves coordination and sample-efficiency.Comment: Accepted at ICAAR

    Distributed Control of a Swarm of Autonomous Unmanned Aerial Vehicles

    Get PDF
    With the increasing use of Unmanned Aerial Vehicles (UAV)s military operations, there is a growing need to develop new methods of control and navigation for these vehicles. This investigation proposes the use of an adaptive swarming algorithm that utilizes local state information to influence the overall behavior of each individual agent in the swarm based upon the agent\u27s current position in the battlespace. In order to investigate the ability of this algorithm to control UAVs in a cooperative manner, a swarm architecture is developed that allows for on-line modification of basic rules. Adaptation is achieved by using a set of behavior coefficients that define the weight at which each of four basic rules is asserted in an individual based upon local state information. An Evolutionary Strategy (ES) is employed to create initial metrics of behavior coefficients. Using this technique, three distinct emergent swarm behaviors are evolved, and each behavior is investigated in terms of the ability of the adaptive swarming algorithm to achieve the desired emergent behavior by modifying the simple rules of each agent. Finally, each of the three behaviors is analyzed visually using a graphical representation of the simulation, and numerically, using a set of metrics developed for this investigation
    corecore