530 research outputs found

    Aprendizagem de coordenação em sistemas multi-agente

    Get PDF
    The ability for an agent to coordinate with others within a system is a valuable property in multi-agent systems. Agents either cooperate as a team to accomplish a common goal, or adapt to opponents to complete different goals without being exploited. Research has shown that learning multi-agent coordination is significantly more complex than learning policies in singleagent environments, and requires a variety of techniques to deal with the properties of a system where agents learn concurrently. This thesis aims to determine how can machine learning be used to achieve coordination within a multi-agent system. It asks what techniques can be used to tackle the increased complexity of such systems and their credit assignment challenges, how to achieve coordination, and how to use communication to improve the behavior of a team. Many algorithms for competitive environments are tabular-based, preventing their use with high-dimension or continuous state-spaces, and may be biased against specific equilibrium strategies. This thesis proposes multiple deep learning extensions for competitive environments, allowing algorithms to reach equilibrium strategies in complex and partially-observable environments, relying only on local information. A tabular algorithm is also extended with a new update rule that eliminates its bias against deterministic strategies. Current state-of-the-art approaches for cooperative environments rely on deep learning to handle the environment’s complexity and benefit from a centralized learning phase. Solutions that incorporate communication between agents often prevent agents from being executed in a distributed manner. This thesis proposes a multi-agent algorithm where agents learn communication protocols to compensate for local partial-observability, and remain independently executed. A centralized learning phase can incorporate additional environment information to increase the robustness and speed with which a team converges to successful policies. The algorithm outperforms current state-of-the-art approaches in a wide variety of multi-agent environments. A permutation invariant network architecture is also proposed to increase the scalability of the algorithm to large team sizes. Further research is needed to identify how can the techniques proposed in this thesis, for cooperative and competitive environments, be used in unison for mixed environments, and whether they are adequate for general artificial intelligence.A capacidade de um agente se coordenar com outros num sistema é uma propriedade valiosa em sistemas multi-agente. Agentes cooperam como uma equipa para cumprir um objetivo comum, ou adaptam-se aos oponentes de forma a completar objetivos egoístas sem serem explorados. Investigação demonstra que aprender coordenação multi-agente é significativamente mais complexo que aprender estratégias em ambientes com um único agente, e requer uma variedade de técnicas para lidar com um ambiente onde agentes aprendem simultaneamente. Esta tese procura determinar como aprendizagem automática pode ser usada para encontrar coordenação em sistemas multi-agente. O documento questiona que técnicas podem ser usadas para enfrentar a superior complexidade destes sistemas e o seu desafio de atribuição de crédito, como aprender coordenação, e como usar comunicação para melhorar o comportamento duma equipa. Múltiplos algoritmos para ambientes competitivos são tabulares, o que impede o seu uso com espaços de estado de alta-dimensão ou contínuos, e podem ter tendências contra estratégias de equilíbrio específicas. Esta tese propõe múltiplas extensões de aprendizagem profunda para ambientes competitivos, permitindo a algoritmos atingir estratégias de equilíbrio em ambientes complexos e parcialmente-observáveis, com base em apenas informação local. Um algoritmo tabular é também extendido com um novo critério de atualização que elimina a sua tendência contra estratégias determinísticas. Atuais soluções de estado-da-arte para ambientes cooperativos têm base em aprendizagem profunda para lidar com a complexidade do ambiente, e beneficiam duma fase de aprendizagem centralizada. Soluções que incorporam comunicação entre agentes frequentemente impedem os próprios de ser executados de forma distribuída. Esta tese propõe um algoritmo multi-agente onde os agentes aprendem protocolos de comunicação para compensarem por observabilidade parcial local, e continuam a ser executados de forma distribuída. Uma fase de aprendizagem centralizada pode incorporar informação adicional sobre ambiente para aumentar a robustez e velocidade com que uma equipa converge para estratégias bem-sucedidas. O algoritmo ultrapassa abordagens estado-da-arte atuais numa grande variedade de ambientes multi-agente. Uma arquitetura de rede invariante a permutações é também proposta para aumentar a escalabilidade do algoritmo para grandes equipas. Mais pesquisa é necessária para identificar como as técnicas propostas nesta tese, para ambientes cooperativos e competitivos, podem ser usadas em conjunto para ambientes mistos, e averiguar se são adequadas a inteligência artificial geral.Apoio financeiro da FCT e do FSE no âmbito do III Quadro Comunitário de ApoioPrograma Doutoral em Informátic

    The evolution of distributed sensing and collective computation in animal populations

    Get PDF
    Many animal groups exhibit rapid, coordinated collective motion. Yet, the evolutionary forces that cause such collective responses to evolve are poorly understood. Here, we develop analytical methods and evolutionary simulations based on experimental data from schooling fish. We use these methods to investigate how populations evolve within unpredictable, time-varying resource environments. We show that populations evolve toward a distinctive regime in behavioral phenotype space, where small responses of individuals to local environmental cues cause spontaneous changes in the collective state of groups. These changes resemble phase transitions in physical systems. Through these transitions, individuals evolve the emergent capacity to sense and respond to resource gradients (i.e. individuals perceive gradients via social interactions, rather than sensing gradients directly), and to allocate themselves among distinct, distant resource patches. Our results yield new insight into how natural selection, acting on selfish individuals, results in the highly effective collective responses evident in nature.National Science Foundation (NSF)Office of Naval ResearchArmy Research OfficeHuman Frontier Science ProgramNSFJames S McDonnell Foundatio

    Bioinspired approaches for coordination and behaviour adaptation of aerial robot swarms

    Get PDF
    Behavioural adaptation is a pervasive component in a myriad of animal societies. A well-known strategy, known as Levy Walk, has been commonly linked to such adaptation in foraging animals, where the motion of individuals couples periods of localized search and long straight forward motions. Despite the vast number of studies on Levy Walks in computational ecology, it was only in the past decade that the first studies applied this concept to robotics tasks. Therefore, this Thesis draws inspiration from the Levy Walk behaviour, and its recent applications to robotics, to design biologically inspired models for two swarm robotics tasks, aiming at increasing the performance with respect to the state of the art. The first task is cooperative surveillance, where the aim is to deploy a swarm so that at any point in time regions of the domain are observed by multiple robots simultaneously. One of the contributions of this Thesis, is the Levy Swarm Algorithm that augments the concept of Levy Walk to include the Reynolds’ flocking rules and achieve both exploration and coordination in a swarm of unmanned aerial vehicles. The second task is adaptive foraging in environments of clustered rewards. In such environments behavioural adaptation is of paramount importance to modulate the transition between exploitation and exploration. Nature enables these adaptive changes by coupling the behaviour to the fluctuation of hormones that are mostly regulated by the endocrine system. This Thesis draws further inspiration from Nature and proposes a second model, the Endocrine Levy Walk, that employs an Artificial Endocrine System as a modulating mechanism of Levy Walk behaviour. The Endocrine Levy Walk is compared with the Yuragi model (Nurzaman et al., 2010), in both simulated and physical experiments where it shows its increased performance in terms of search efficiency, energy efficiency and number of rewards found. The Endocrine Levy Walk is then augmented to consider social interactions between members of the swarm by mimicking the behaviour of fireflies, where individuals attract others when finding suitable environmental conditions. This extended model, the Endocrine Levy Firefly, is compared to the Levy+ model (Sutantyo et al., 2013) and the Adaptive Collective Levy Walk Nauta et al. (2020). This comparison is also made both in simulated and physical experiments and assessed in terms of search efficiency, number of rewards found and cluster search efficiency, strengthening the argument in favour of the Endocrine Levy Firefly as a promising approach to tackle collaborative foragin

    SWARM INTELLIGENCE AND STIGMERGY: ROBOTIC IMPLEMENTATION OF FORAGING BEHAVIOR

    Get PDF
    Swarm intelligence in multi-robot systems has become an important area of research within collective robotics. Researchers have gained inspiration from biological systems and proposed a variety of industrial, commercial, and military robotics applications. In order to bridge the gap between theory and application, a strong focus is required on robotic implementation of swarm intelligence. To date, theoretical research and computer simulations in the field have dominated, with few successful demonstrations of swarm-intelligent robotic systems. In this thesis, a study of intelligent foraging behavior via indirect communication between simple individual agents is presented. Models of foraging are reviewed and analyzed with respect to the system dynamics and dependence on important parameters. Computer simulations are also conducted to gain an understanding of foraging behavior in systems with large populations. Finally, a novel robotic implementation is presented. The experiment successfully demonstrates cooperative group foraging behavior without direct communication. Trail-laying and trail-following are employed to produce the required stigmergic cooperation. Real robots are shown to achieve increased task efficiency, as a group, resulting from indirect interactions. Experimental results also confirm that trail-based group foraging systems can adapt to dynamic environments

    Evolution of Memory in Reactive Artificial Neural Networks

    Get PDF
    In the neuronal circuits of natural and artificial agents, memory is usually implemented with recurrent connections, since recurrence allows past agent state to affect the present, on-going behavior. Here, an interesting question arises in the context of evolution: how reactive agents could have evolved into cognitive ones with internalized memory? This study strives to find an answer to the question by simulating neuroevolution on artificial neural networks, with the hypothesis that internalization of external material interaction can be a plausible evolutionary path leading to a fully internalized memory system. A series of computational experiments were performed to gradually verify the above hypothesis. The first experiment demonstrated the possibility that external materials can be used as memory-aids for a memoryless reactive artificial agents in a simple 1-dimensional environment. Here, the reactive artificial agents used environmental markers as memory references to be successful in the ball-catching task that requires memory. Motivated by the result of the first experiment, an extended experiment was conducted to tackle a more complex memory problem using the same principle of external material interaction. This time, the reactive artificial agents are tasked to remember the locations of food items and the nest in a 2-dimensional environment. Such path-following behavior is a trivial foraging strategy of various lower animals such as ants and fish. The final experiment was designed to show the evolution of internal recurrence. In this experiment, I showed the evolutionary advantage of external material interaction by comparing the results from neural network topology evolution algorithms with and without the material interaction mechanism. The result confirmed that the agents with external material interaction learned to solve the memory task faster and more accurately. The results of the experiments provide insights on the possible evolutionary route to an internalized memory. The use of external material interaction can help reactive artificial agents to go beyond the functionality restricted by their simple network structure. Moreover, it allows much faster convergence with higher accuracy than the topological evolution of the artificial agents. These results suggest one plausible evolutionary path from reactive, through external material interaction, to recurrent structure
    corecore