63,809 research outputs found
Asynchronous stochastic approximations with asymptotically biased errors and deep multi-agent learning
Asynchronous stochastic approximations (SAs) are an important class of
model-free algorithms, tools and techniques that are popular in multi-agent and
distributed control scenarios. To counter Bellman's curse of dimensionality,
such algorithms are coupled with function approximations. Although the
learning/ control problem becomes more tractable, function approximations
affect stability and convergence. In this paper, we present verifiable
sufficient conditions for stability and convergence of asynchronous SAs with
biased approximation errors. The theory developed herein is used to analyze
Policy Gradient methods and noisy Value Iteration schemes. Specifically, we
analyze the asynchronous approximate counterparts of the policy gradient (A2PG)
and value iteration (A2VI) schemes. It is shown that the stability of these
algorithms is unaffected by biased approximation errors, provided they are
asymptotically bounded. With respect to convergence (of A2VI and A2PG), a
relationship between the limiting set and the approximation errors is
established. Finally, experimental results are presented that support the
theory
Linear Stochastic Approximation Algorithms and Group Consensus over Random Signed Networks: A Technical Report with All Proofs
This paper studies linear stochastic approximation (SA) algorithms and their
application to multi-agent systems in engineering and sociology. As main
contribution, we provide necessary and sufficient conditions for convergence of
linear SA algorithms to a deterministic or random final vector. We also
characterize the system convergence rate, when the system is convergent.
Moreover, differing from non-negative gain functions in traditional SA
algorithms, this paper considers also the case when the gain functions are
allowed to take arbitrary real numbers. Using our general treatment, we provide
necessary and sufficient conditions to reach consensus and group consensus for
first-order discrete-time multi-agent system over random signed networks and
with state-dependent noise. Finally, we extend our results to the setting of
multi-dimensional linear SA algorithms and characterize the behavior of the
multi-dimensional Friedkin-Johnsen model over random interaction networks
Finite-Sample Analysis For Decentralized Batch Multi-Agent Reinforcement Learning With Networked Agents
Despite the increasing interest in multi-agent reinforcement learning (MARL)
in multiple communities, understanding its theoretical foundation has long been
recognized as a challenging problem. In this work, we address this problem by
providing a finite-sample analysis for decentralized batch MARL with networked
agents. Specifically, we consider two decentralized MARL settings, where teams
of agents are connected by time-varying communication networks, and either
collaborate or compete in a zero-sum game setting, without any central
controller. These settings cover many conventional MARL settings in the
literature. For both settings, we develop batch MARL algorithms that can be
implemented in a decentralized fashion, and quantify the finite-sample errors
of the estimated action-value functions. Our error analysis captures how the
function class, the number of samples within each iteration, and the number of
iterations determine the statistical accuracy of the proposed algorithms. Our
results, compared to the finite-sample bounds for single-agent RL, involve
additional error terms caused by decentralized computation, which is inherent
in our decentralized MARL setting. This work appears to be the first
finite-sample analysis for batch MARL, which sheds light on understanding both
the sample and computational efficiency of MARL algorithms in general
Min-Max Tours for Task Allocation to Heterogeneous Agents
We consider a scenario consisting of a set of heterogeneous mobile agents
located at a depot, and a set of tasks dispersed over a geographic area. The
agents are partitioned into different types. The tasks are partitioned into
specialized tasks that can only be done by agents of a certain type, and
generic tasks that can be done by any agent. The distances between each pair of
tasks are specified, and satisfy the triangle inequality. Given this scenario,
we address the problem of allocating these tasks among the available agents
(subject to type compatibility constraints) while minimizing the maximum cost
to tour the allocation by any agent and return to the depot. This problem is
NP-hard, and we give a three phase algorithm to solve this problem that
provides 5-factor approximation, regardless of the total number of agents and
the number of agents of each type. We also show that in the special case where
there is only one agent of each type, the algorithm has an approximation factor
of 4
Intent-aware Multi-agent Reinforcement Learning
This paper proposes an intent-aware multi-agent planning framework as well as
a learning algorithm. Under this framework, an agent plans in the goal space to
maximize the expected utility. The planning process takes the belief of other
agents' intents into consideration. Instead of formulating the learning problem
as a partially observable Markov decision process (POMDP), we propose a simple
but effective linear function approximation of the utility function. It is
based on the observation that for humans, other people's intents will pose an
influence on our utility for a goal. The proposed framework has several major
advantages: i) it is computationally feasible and guaranteed to converge. ii)
It can easily integrate existing intent prediction and low-level planning
algorithms. iii) It does not suffer from sparse feedbacks in the action space.
We experiment our algorithm in a real-world problem that is non-episodic, and
the number of agents and goals can vary over time. Our algorithm is trained in
a scene in which aerial robots and humans interact, and tested in a novel scene
with a different environment. Experimental results show that our algorithm
achieves the best performance and human-like behaviors emerge during the
dynamic process.Comment: ICRA 201
Efficient sensor network planning method using approximate potential game
This paper addresses information-based sensing point selection from a set of
possible sensing locations, which determines a set of measurement points
maximizing the mutual information between the sensor measurements and the
variables of interest. A potential game approach has been applied to addressing
distributed implementation of decision making for cooperative sensor planning.
When a sensor network involves a large number of sensing agents, the local
utility function for a sensing agent is hard to compute, because the local
utility function depends on the other agents' decisions while each sensing
agent is inherently faced with limitations in both its communication and
computational capabilities. Accordingly, a local utility function for each
agent should be approximated to accommodate limitations in information
gathering and processing. We propose an approximation method for a local
utility function using only a portion of the decisions of other agents. The
part of the decisions that each agent considers is called the neighboring set
for the agent. The error induced by the approximation is also analyzed, and to
keep the error small we propose a neighbor selection algorithm that chooses the
neighbor set for each agent in a greedy way. The selection algorithm is based
on the correlation information between one agent's measurement selection and
the other agents' selections. Futhermore, we show that a game with an
approximate local utility function has an -equilibrium and the set of
the equilibria include the Nash equilibrium of the original potential game. We
demonstrate the validity of our approximation method through two numerical
examples on simplified weather forecasting and multi-target tracking.Comment: 24 pages, 4 figures, submitted to IJDSN(International Journal of
Distributed Sensor Networks
Finding Consensus in Multi-Agent Networks Using Heat Kernel Pagerank
We present a new and efficient algorithm for determining a consensus value
for a network of agents. Different from existing algorithms, our algorithm
evaluates the consensus value for very large networks using heat kernel
pagerank. We consider two frameworks for the consensus problem, a weighted
average consensus among all agents, and consensus in a leader-following
formation. Using a heat kernel pagerank approximation, we give consensus
algorithms that run in time sublinear in the size of the network, and provide
quantitative analysis of the tradeoff between performance guarantees and error
estimates
Truthful Mechanisms via Greedy Iterative Packing
An important research thread in algorithmic game theory studies the design of
efficient truthful mechanisms that approximate the optimal social welfare. A
fundamental question is whether an \alpha-approximation algorithm translates
into an \alpha-approximate truthful mechanism. It is well-known that plugging
an \alpha-approximation algorithm into the VCG technique may not yield a
truthful mechanism. Thus, it is natural to investigate properties of
approximation algorithms that enable their use in truthful mechanisms.
The main contribution of this paper is to identify a useful and natural
property of approximation algorithms, which we call loser-independence; this
property is applicable in the single-minded and single-parameter settings.
Intuitively, a loser-independent algorithm does not change its outcome when the
bid of a losing agent increases, unless that agent becomes a winner. We
demonstrate that loser-independent algorithms can be employed as sub-procedures
in a greedy iterative packing approach while preserving monotonicity. A greedy
iterative approach provides a good approximation in the context of maximizing a
non-decreasing submodular function subject to independence constraints. Our
framework gives rise to truthful approximation mechanisms for various problems.
Notably, some problems arise in online mechanism design.Comment: 20 pages, 1 figur
A Framework for learning multi-agent dynamic formation strategy in real-time applications
Formation strategy is one of the most important parts of many multi-agent
systems with many applications in real world problems. In this paper, a
framework for learning this task in a limited domain (restricted environment)
is proposed. In this framework, agents learn either directly by observing an
expert behavior or indirectly by observing other agents or objects behavior.
First, a group of algorithms for learning formation strategy based on limited
features will be presented. Due to distributed and complex nature of many
multi-agent systems, it is impossible to include all features directly in the
learning process; thus, a modular scheme is proposed in order to reduce the
number of features. In this method, some important features have indirect
influence in learning instead of directly involving them as input features.
This framework has the ability to dynamically assign a group of positions to a
group of agents to improve system performance. In addition, it can change the
formation strategy when the context changes. Finally, this framework is able to
automatically produce many complex and flexible formation strategy algorithms
without directly involving an expert to present and implement such complex
algorithms.Comment: 27 pages, 9 figure
A Survey and Critique of Multiagent Deep Reinforcement Learning
Deep reinforcement learning (RL) has achieved outstanding results in recent
years. This has led to a dramatic increase in the number of applications and
methods. Recent works have explored learning beyond single-agent scenarios and
have considered multiagent learning (MAL) scenarios. Initial results report
successes in complex multiagent domains, although there are several challenges
to be addressed. The primary goal of this article is to provide a clear
overview of current multiagent deep reinforcement learning (MDRL) literature.
Additionally, we complement the overview with a broader analysis: (i) we
revisit previous key components, originally presented in MAL and RL, and
highlight how they have been adapted to multiagent deep reinforcement learning
settings. (ii) We provide general guidelines to new practitioners in the area:
describing lessons learned from MDRL works, pointing to recent benchmarks, and
outlining open avenues of research. (iii) We take a more critical tone raising
practical challenges of MDRL (e.g., implementation and computational demands).
We expect this article will help unify and motivate future research to take
advantage of the abundant literature that exists (e.g., RL and MAL) in a joint
effort to promote fruitful research in the multiagent community.Comment: Under review since Oct 2018. Earlier versions of this work had the
title: "Is multiagent deep reinforcement learning the answer or the question?
A brief survey
- …