10 research outputs found

    Dynamic Partition of Collaborative Multiagent Based on Coordination Trees

    Get PDF
    In team Markov games research, it is difficult for an individual agent to calculate the reward of collaborative agents dynamically. We present a coordination tree structure whose nodes are agent subsets or an agent. Two kinds of weights of a tree are defined which describe the cost of an agent collaborating with an agent subset. We can calculate a collaborative agent subset and its minimal cost for collaboration using these coordination trees. Some experiments of a Markov game have been done by using this novel algorithm. The results of the experiments prove that this method outperforms related multi-agent reinforcement-learning methods based on alterable collaborative teams

    Multiagent systems: games and learning from structures

    Get PDF
    Multiple agents have become increasingly utilized in various fields for both physical robots and software agents, such as search and rescue robots, automated driving, auctions and electronic commerce agents, and so on. In multiagent domains, agents interact and coadapt with other agents. Each agent's choice of policy depends on the others' joint policy to achieve the best available performance. During this process, the environment evolves and is no longer stationary, where each agent adapts to proceed towards its target. Each micro-level step in time may present a different learning problem which needs to be addressed. However, in this non-stationary environment, a holistic phenomenon forms along with the rational strategies of all players; we define this phenomenon as structural properties. In our research, we present the importance of analyzing the structural properties, and how to extract the structural properties in multiagent environments. According to the agents' objectives, a multiagent environment can be classified as self-interested, cooperative, or competitive. We examine the structure from these three general multiagent environments: self-interested random graphical game playing, distributed cooperative team playing, and competitive group survival. In each scenario, we analyze the structure in each environmental setting, and demonstrate the structure learned as a comprehensive representation: structure of players' action influence, structure of constraints in teamwork communication, and structure of inter-connections among strategies. This structure represents macro-level knowledge arising in a multiagent system, and provides critical, holistic information for each problem domain. Last, we present some open issues and point toward future research

    Independent Learning Approaches: Overcoming Multi-Agent Learning Pathologies In Team-Games

    Get PDF
    Deep Neural Networks enable Reinforcement Learning (RL) agents to learn behaviour policies directly from high-dimensional observations. As a result, the field of Deep Reinforcement Learning (DRL) has seen a great number of successes. Recently the sub-field of Multi-Agent DRL (MADRL) has received an increased amount of attention. However, considerations are required when using RL in Multi-Agent Systems. For instance Independent Learners (ILs) lack the convergence guarantees of many single-agent RL approaches, even in domains that do not require a MADRL approach. Furthermore, ILs must often overcome a number of learning pathologies to converge upon an optimal joint-policy. Numerous IL approaches have been proposed to facilitate cooperation, including hysteretic Q-learning (Matignon et al., 2007) and leniency (Panait et al., 2006). Recently LMRL2, a variation of leniency, proved robust towards a number of pathologies in low-dimensional domains, including miscoordination, relative overgeneralization, stochasticity, the alter-exploration problem and the moving target problem (Wei and Luke, 2016). In contrast, the majority of work on ILs in MADRL focuses on an amplified moving target problem, caused by neural networks being trained with potentially obsolete samples drawn from experience replay memories. In this thesis we combine advances from research on ILs with DRL algorithms. However, first we evaluate the robustness of tabular approaches along each of the above pathology dimensions. Upon identifying a number of weaknesses that prevent LMRL2 from consistently converging upon optimal joint-policies we propose a new version of leniency, Distributed-Lenient Q-learning (DLQ). We find DLQ delivers state of the art performances in strategic-form and Markov games from Multi-Agent Reinforcement Learning literature. We subsequently scale leniency to MADRL, introducing Lenient (Double) Deep Q-Network (LDDQN). We empirically evaluate LDDQN with extensions of the Cooperative Multi-Agent Object Transportation Problem (Bucsoniu et al., 2010), finding that LDDQN outperforms hysteretic deep Q-learners in domains with multiple dropzones yielding stochastic rewards. Finally, to evaluate deep ILs along each pathology dimension we introduce a new MADRL environment: the Apprentice Firemen Game (AFG). We find lenient and hysteretic approaches fail to consistently learn near optimal joint-policies in the AFG. To address these pathologies we introduce Negative Update Intervals-DDQN (NUI-DDQN), a MADRL algorithm which discards episodes yielding cumulative rewards outside the range of expanding intervals. NUI-DDQN consistently gravitates towards optimal joint-policies in deterministic and stochastic reward settings of the AFG, overcoming the outlined pathologies
    corecore