2,136 research outputs found

    Dealing With Non-stationarity in Decentralized Cooperative Multi-Agent Deep Reinforcement Learning via Multi-Timescale Learning

    Full text link
    Decentralized cooperative multi-agent deep reinforcement learning (MARL) can be a versatile learning framework, particularly in scenarios where centralized training is either not possible or not practical. One of the critical challenges in decentralized deep MARL is the non-stationarity of the learning environment when multiple agents are learning concurrently. A commonly used and efficient scheme for decentralized MARL is independent learning in which agents concurrently update their policies independently of each other. We first show that independent learning does not always converge, while sequential learning where agents update their policies one after another in a sequence is guaranteed to converge to an agent-by-agent optimal solution. In sequential learning, when one agent updates its policy, all other agent's policies are kept fixed, alleviating the challenge of non-stationarity due to simultaneous updates in other agents' policies. However, it can be slow because only one agent is learning at any time. Therefore it might also not always be practical. In this work, we propose a decentralized cooperative MARL algorithm based on multi-timescale learning. In multi-timescale learning, all agents learn simultaneously, but at different learning rates. In our proposed method, when one agent updates its policy, other agents are allowed to update their policies as well, but at a slower rate. This speeds up sequential learning, while also minimizing non-stationarity caused by other agents updating concurrently. Multi-timescale learning outperforms state-of-the-art decentralized learning methods on a set of challenging multi-agent cooperative tasks in the epymarl(Papoudakis et al., 2020) benchmark. This can be seen as a first step towards more general decentralized cooperative deep MARL methods based on multi-timescale learning

    MANSA: Learning Fast and Slow in Multi-Agent Systems

    Full text link
    In multi-agent reinforcement learning (MARL), independent learning (IL) often shows remarkable performance and easily scales with the number of agents. Yet, using IL can be inefficient and runs the risk of failing to successfully train, particularly in scenarios that require agents to coordinate their actions. Using centralised learning (CL) enables MARL agents to quickly learn how to coordinate their behaviour but employing CL everywhere is often prohibitively expensive in real-world applications. Besides, using CL in value-based methods often needs strong representational constraints (e.g. individual-global-max condition) that can lead to poor performance if violated. In this paper, we introduce a novel plug & play IL framework named Multi-Agent Network Selection Algorithm (MANSA) which selectively employs CL only at states that require coordination. At its core, MANSA has an additional agent that uses switching controls to quickly learn the best states to activate CL during training, using CL only where necessary and vastly reducing the computational burden of CL. Our theory proves MANSA preserves cooperative MARL convergence properties, boosts IL performance and can optimally make use of a fixed budget on the number CL calls. We show empirically in Level-based Foraging (LBF) and StarCraft Multi-agent Challenge (SMAC) that MANSA achieves fast, superior and more reliable performance while making 40% fewer CL calls in SMAC and using CL at only 1% CL calls in LBF

    An Energy-Aware Algorithm for Large Scale Foraging Systems

    Get PDF
    International audienceThe foraging task is one of the canonical testbeds for cooperative robotics, in which a collection of coordinated robots have to find and transport one or more objects to one or more specific storage points. Swarm robotics has been widely considered in such situations, due to its strengths such as robustness, simplicity and scalability. Typical multi-robot foraging systems currently consider tens to hundreds of agents. This paper presents a new algorithm called Energy-aware Cooperative Switching Algorithm for Foraging (EC-SAF) that manages thousands of robots. We investigate therefore the scalability of EC-SAF algorithm and the parameters that can affect energy efficiency overtime. Results indicate that EC-SAF is scalable and effective in reducing swarm energy consumption compared to an energy-aware version of the reference well-known c-marking algorithm (Ec-marking)

    COORDINATION OF LEADER-FOLLOWER MULTI-AGENT SYSTEM WITH TIME-VARYING OBJECTIVE FUNCTION

    Get PDF
    This thesis aims to introduce a new framework for the distributed control of multi-agent systems with adjustable swarm control objectives. Our goal is twofold: 1) to provide an overview to how time-varying objectives in the control of autonomous systems may be applied to the distributed control of multi-agent systems with variable autonomy level, and 2) to introduce a framework to incorporate the proposed concept to fundamental swarm behaviors such as aggregation and leader tracking. Leader-follower multi-agent systems are considered in this study, and a general form of time-dependent artificial potential function is proposed to describe the varying objectives of the system in the case of complete information exchange. Using Lyapunov methods, the stability and boundedness of the agents\u27 trajectories under single order and higher order dynamics are analyzed. Illustrative numerical simulations are presented to demonstrate the validity of our results. Then, we extend these results for multi-agent systems with limited information exchange and switching communication topology. The first steps of the realization of an experimental framework have been made with the ultimate goal of verifying the simulation results in practice

    Heterogeneous Stochastic Interactions for Multiple Agents in a Multi-armed Bandit Problem

    Full text link
    We define and analyze a multi-agent multi-armed bandit problem in which decision-making agents can observe the choices and rewards of their neighbors. Neighbors are defined by a network graph with heterogeneous and stochastic interconnections. These interactions are determined by the sociability of each agent, which corresponds to the probability that the agent observes its neighbors. We design an algorithm for each agent to maximize its own expected cumulative reward and prove performance bounds that depend on the sociability of the agents and the network structure. We use the bounds to predict the rank ordering of agents according to their performance and verify the accuracy analytically and computationally

    Multi‑Agent Foraging: state‑of‑the‑art and research challenges

    Get PDF
    International audienceThe foraging task is one of the canonical testbeds for cooperative robotics, in which a collection of robots has to search and transport objects to specific storage point(s). In this paper, we investigate the Multi-Agent Foraging (MAF) problem from several perspectives that we analyze in depth. First, we define the Foraging Problem according to literature definitions. Then we analyze previously proposed taxonomies, and propose a new foraging taxonomy characterized by four principal axes: Environment, Collective, Strategy and Simulation, summarize related foraging works and classify them through our new foraging taxonomy. Then, we discuss the real implementation of MAF and present a comparison between some related foraging works considering important features that show extensibility, reliability and scalability of MAF systems. Finally we present and discuss recent trends in this field, emphasizing the various challenges that could enhance the existing MAF solutions and make them realistic
    • …
    corecore