63,809 research outputs found

    Asynchronous stochastic approximations with asymptotically biased errors and deep multi-agent learning

    Full text link
    Asynchronous stochastic approximations (SAs) are an important class of model-free algorithms, tools and techniques that are popular in multi-agent and distributed control scenarios. To counter Bellman's curse of dimensionality, such algorithms are coupled with function approximations. Although the learning/ control problem becomes more tractable, function approximations affect stability and convergence. In this paper, we present verifiable sufficient conditions for stability and convergence of asynchronous SAs with biased approximation errors. The theory developed herein is used to analyze Policy Gradient methods and noisy Value Iteration schemes. Specifically, we analyze the asynchronous approximate counterparts of the policy gradient (A2PG) and value iteration (A2VI) schemes. It is shown that the stability of these algorithms is unaffected by biased approximation errors, provided they are asymptotically bounded. With respect to convergence (of A2VI and A2PG), a relationship between the limiting set and the approximation errors is established. Finally, experimental results are presented that support the theory

    Linear Stochastic Approximation Algorithms and Group Consensus over Random Signed Networks: A Technical Report with All Proofs

    Full text link
    This paper studies linear stochastic approximation (SA) algorithms and their application to multi-agent systems in engineering and sociology. As main contribution, we provide necessary and sufficient conditions for convergence of linear SA algorithms to a deterministic or random final vector. We also characterize the system convergence rate, when the system is convergent. Moreover, differing from non-negative gain functions in traditional SA algorithms, this paper considers also the case when the gain functions are allowed to take arbitrary real numbers. Using our general treatment, we provide necessary and sufficient conditions to reach consensus and group consensus for first-order discrete-time multi-agent system over random signed networks and with state-dependent noise. Finally, we extend our results to the setting of multi-dimensional linear SA algorithms and characterize the behavior of the multi-dimensional Friedkin-Johnsen model over random interaction networks

    Finite-Sample Analysis For Decentralized Batch Multi-Agent Reinforcement Learning With Networked Agents

    Full text link
    Despite the increasing interest in multi-agent reinforcement learning (MARL) in multiple communities, understanding its theoretical foundation has long been recognized as a challenging problem. In this work, we address this problem by providing a finite-sample analysis for decentralized batch MARL with networked agents. Specifically, we consider two decentralized MARL settings, where teams of agents are connected by time-varying communication networks, and either collaborate or compete in a zero-sum game setting, without any central controller. These settings cover many conventional MARL settings in the literature. For both settings, we develop batch MARL algorithms that can be implemented in a decentralized fashion, and quantify the finite-sample errors of the estimated action-value functions. Our error analysis captures how the function class, the number of samples within each iteration, and the number of iterations determine the statistical accuracy of the proposed algorithms. Our results, compared to the finite-sample bounds for single-agent RL, involve additional error terms caused by decentralized computation, which is inherent in our decentralized MARL setting. This work appears to be the first finite-sample analysis for batch MARL, which sheds light on understanding both the sample and computational efficiency of MARL algorithms in general

    Min-Max Tours for Task Allocation to Heterogeneous Agents

    Full text link
    We consider a scenario consisting of a set of heterogeneous mobile agents located at a depot, and a set of tasks dispersed over a geographic area. The agents are partitioned into different types. The tasks are partitioned into specialized tasks that can only be done by agents of a certain type, and generic tasks that can be done by any agent. The distances between each pair of tasks are specified, and satisfy the triangle inequality. Given this scenario, we address the problem of allocating these tasks among the available agents (subject to type compatibility constraints) while minimizing the maximum cost to tour the allocation by any agent and return to the depot. This problem is NP-hard, and we give a three phase algorithm to solve this problem that provides 5-factor approximation, regardless of the total number of agents and the number of agents of each type. We also show that in the special case where there is only one agent of each type, the algorithm has an approximation factor of 4

    Intent-aware Multi-agent Reinforcement Learning

    Full text link
    This paper proposes an intent-aware multi-agent planning framework as well as a learning algorithm. Under this framework, an agent plans in the goal space to maximize the expected utility. The planning process takes the belief of other agents' intents into consideration. Instead of formulating the learning problem as a partially observable Markov decision process (POMDP), we propose a simple but effective linear function approximation of the utility function. It is based on the observation that for humans, other people's intents will pose an influence on our utility for a goal. The proposed framework has several major advantages: i) it is computationally feasible and guaranteed to converge. ii) It can easily integrate existing intent prediction and low-level planning algorithms. iii) It does not suffer from sparse feedbacks in the action space. We experiment our algorithm in a real-world problem that is non-episodic, and the number of agents and goals can vary over time. Our algorithm is trained in a scene in which aerial robots and humans interact, and tested in a novel scene with a different environment. Experimental results show that our algorithm achieves the best performance and human-like behaviors emerge during the dynamic process.Comment: ICRA 201

    Efficient sensor network planning method using approximate potential game

    Full text link
    This paper addresses information-based sensing point selection from a set of possible sensing locations, which determines a set of measurement points maximizing the mutual information between the sensor measurements and the variables of interest. A potential game approach has been applied to addressing distributed implementation of decision making for cooperative sensor planning. When a sensor network involves a large number of sensing agents, the local utility function for a sensing agent is hard to compute, because the local utility function depends on the other agents' decisions while each sensing agent is inherently faced with limitations in both its communication and computational capabilities. Accordingly, a local utility function for each agent should be approximated to accommodate limitations in information gathering and processing. We propose an approximation method for a local utility function using only a portion of the decisions of other agents. The part of the decisions that each agent considers is called the neighboring set for the agent. The error induced by the approximation is also analyzed, and to keep the error small we propose a neighbor selection algorithm that chooses the neighbor set for each agent in a greedy way. The selection algorithm is based on the correlation information between one agent's measurement selection and the other agents' selections. Futhermore, we show that a game with an approximate local utility function has an ϵ\epsilon-equilibrium and the set of the equilibria include the Nash equilibrium of the original potential game. We demonstrate the validity of our approximation method through two numerical examples on simplified weather forecasting and multi-target tracking.Comment: 24 pages, 4 figures, submitted to IJDSN(International Journal of Distributed Sensor Networks

    Finding Consensus in Multi-Agent Networks Using Heat Kernel Pagerank

    Full text link
    We present a new and efficient algorithm for determining a consensus value for a network of agents. Different from existing algorithms, our algorithm evaluates the consensus value for very large networks using heat kernel pagerank. We consider two frameworks for the consensus problem, a weighted average consensus among all agents, and consensus in a leader-following formation. Using a heat kernel pagerank approximation, we give consensus algorithms that run in time sublinear in the size of the network, and provide quantitative analysis of the tradeoff between performance guarantees and error estimates

    Truthful Mechanisms via Greedy Iterative Packing

    Full text link
    An important research thread in algorithmic game theory studies the design of efficient truthful mechanisms that approximate the optimal social welfare. A fundamental question is whether an \alpha-approximation algorithm translates into an \alpha-approximate truthful mechanism. It is well-known that plugging an \alpha-approximation algorithm into the VCG technique may not yield a truthful mechanism. Thus, it is natural to investigate properties of approximation algorithms that enable their use in truthful mechanisms. The main contribution of this paper is to identify a useful and natural property of approximation algorithms, which we call loser-independence; this property is applicable in the single-minded and single-parameter settings. Intuitively, a loser-independent algorithm does not change its outcome when the bid of a losing agent increases, unless that agent becomes a winner. We demonstrate that loser-independent algorithms can be employed as sub-procedures in a greedy iterative packing approach while preserving monotonicity. A greedy iterative approach provides a good approximation in the context of maximizing a non-decreasing submodular function subject to independence constraints. Our framework gives rise to truthful approximation mechanisms for various problems. Notably, some problems arise in online mechanism design.Comment: 20 pages, 1 figur

    A Framework for learning multi-agent dynamic formation strategy in real-time applications

    Full text link
    Formation strategy is one of the most important parts of many multi-agent systems with many applications in real world problems. In this paper, a framework for learning this task in a limited domain (restricted environment) is proposed. In this framework, agents learn either directly by observing an expert behavior or indirectly by observing other agents or objects behavior. First, a group of algorithms for learning formation strategy based on limited features will be presented. Due to distributed and complex nature of many multi-agent systems, it is impossible to include all features directly in the learning process; thus, a modular scheme is proposed in order to reduce the number of features. In this method, some important features have indirect influence in learning instead of directly involving them as input features. This framework has the ability to dynamically assign a group of positions to a group of agents to improve system performance. In addition, it can change the formation strategy when the context changes. Finally, this framework is able to automatically produce many complex and flexible formation strategy algorithms without directly involving an expert to present and implement such complex algorithms.Comment: 27 pages, 9 figure

    A Survey and Critique of Multiagent Deep Reinforcement Learning

    Full text link
    Deep reinforcement learning (RL) has achieved outstanding results in recent years. This has led to a dramatic increase in the number of applications and methods. Recent works have explored learning beyond single-agent scenarios and have considered multiagent learning (MAL) scenarios. Initial results report successes in complex multiagent domains, although there are several challenges to be addressed. The primary goal of this article is to provide a clear overview of current multiagent deep reinforcement learning (MDRL) literature. Additionally, we complement the overview with a broader analysis: (i) we revisit previous key components, originally presented in MAL and RL, and highlight how they have been adapted to multiagent deep reinforcement learning settings. (ii) We provide general guidelines to new practitioners in the area: describing lessons learned from MDRL works, pointing to recent benchmarks, and outlining open avenues of research. (iii) We take a more critical tone raising practical challenges of MDRL (e.g., implementation and computational demands). We expect this article will help unify and motivate future research to take advantage of the abundant literature that exists (e.g., RL and MAL) in a joint effort to promote fruitful research in the multiagent community.Comment: Under review since Oct 2018. Earlier versions of this work had the title: "Is multiagent deep reinforcement learning the answer or the question? A brief survey
    • …