1,808 research outputs found
Scalable Planning and Learning for Multiagent POMDPs: Extended Version
Online, sample-based planning algorithms for POMDPs have shown great promise
in scaling to problems with large state spaces, but they become intractable for
large action and observation spaces. This is particularly problematic in
multiagent POMDPs where the action and observation space grows exponentially
with the number of agents. To combat this intractability, we propose a novel
scalable approach based on sample-based planning and factored value functions
that exploits structure present in many multiagent settings. This approach
applies not only in the planning case, but also in the Bayesian reinforcement
learning setting. Experimental results show that we are able to provide high
quality solutions to large multiagent planning and learning problems
Task-Based Information Compression for Multi-Agent Communication Problems with Channel Rate Constraints
A collaborative task is assigned to a multiagent system (MAS) in which agents
are allowed to communicate. The MAS runs over an underlying Markov decision
process and its task is to maximize the averaged sum of discounted one-stage
rewards. Although knowing the global state of the environment is necessary for
the optimal action selection of the MAS, agents are limited to individual
observations. The inter-agent communication can tackle the issue of local
observability, however, the limited rate of the inter-agent communication
prevents the agent from acquiring the precise global state information. To
overcome this challenge, agents need to communicate their observations in a
compact way such that the MAS compromises the minimum possible sum of rewards.
We show that this problem is equivalent to a form of rate-distortion problem
which we call the task-based information compression. We introduce a scheme for
task-based information compression titled State aggregation for information
compression (SAIC), for which a state aggregation algorithm is analytically
designed. The SAIC is shown to be capable of achieving near-optimal performance
in terms of the achieved sum of discounted rewards. The proposed algorithm is
applied to a rendezvous problem and its performance is compared with several
benchmarks. Numerical experiments confirm the superiority of the proposed
algorithm.Comment: 13 pages, 9 figure
Decentralized Control of Partially Observable Markov Decision Processes using Belief Space Macro-actions
The focus of this paper is on solving multi-robot planning problems in
continuous spaces with partial observability. Decentralized partially
observable Markov decision processes (Dec-POMDPs) are general models for
multi-robot coordination problems, but representing and solving Dec-POMDPs is
often intractable for large problems. To allow for a high-level representation
that is natural for multi-robot problems and scalable to large discrete and
continuous problems, this paper extends the Dec-POMDP model to the
decentralized partially observable semi-Markov decision process (Dec-POSMDP).
The Dec-POSMDP formulation allows asynchronous decision-making by the robots,
which is crucial in multi-robot domains. We also present an algorithm for
solving this Dec-POSMDP which is much more scalable than previous methods since
it can incorporate closed-loop belief space macro-actions in planning. These
macro-actions are automatically constructed to produce robust solutions. The
proposed method's performance is evaluated on a complex multi-robot package
delivery problem under uncertainty, showing that our approach can naturally
represent multi-robot problems and provide high-quality solutions for
large-scale problems
An Auction-based Coordination Strategy for Task-Constrained Multi-Agent Stochastic Planning with Submodular Rewards
In many domains such as transportation and logistics, search and rescue, or
cooperative surveillance, tasks are pending to be allocated with the
consideration of possible execution uncertainties. Existing task coordination
algorithms either ignore the stochastic process or suffer from the
computational intensity. Taking advantage of the weakly coupled feature of the
problem and the opportunity for coordination in advance, we propose a
decentralized auction-based coordination strategy using a newly formulated
score function which is generated by forming the problem into task-constrained
Markov decision processes (MDPs). The proposed method guarantees convergence
and at least 50% optimality in the premise of a submodular reward function.
Furthermore, for the implementation on large-scale applications, an approximate
variant of the proposed method, namely Deep Auction, is also suggested with the
use of neural networks, which is evasive of the troublesome for constructing
MDPs. Inspired by the well-known actor-critic architecture, two Transformers
are used to map observations to action probabilities and cumulative rewards
respectively. Finally, we demonstrate the performance of the two proposed
approaches in the context of drone deliveries, where the stochastic planning
for the drone league is cast into a stochastic price-collecting Vehicle Routing
Problem (VRP) with time windows. Simulation results are compared with
state-of-the-art methods in terms of solution quality, planning efficiency and
scalability.Comment: 17 pages, 5 figure
Cooperative Control and Potential Games
We present a view of cooperative control using the language of learning in games. We review the game-theoretic concepts of potential and weakly acyclic games, and demonstrate how several cooperative control problems, such as consensus and dynamic sensor coverage, can be formulated in these settings. Motivated by this connection, we build upon game-theoretic concepts to better accommodate a broader class of cooperative control problems. In particular, we extend existing learning algorithms to accommodate restricted action sets caused by the limitations of agent capabilities and group based decision making. Furthermore, we also introduce a new class of games called sometimes weakly acyclic games for time-varying objective functions and action sets, and provide distributed algorithms for convergence to an equilibrium
- …