8 research outputs found
Scalable Planning and Learning for Multiagent POMDPs: Extended Version
Online, sample-based planning algorithms for POMDPs have shown great promise
in scaling to problems with large state spaces, but they become intractable for
large action and observation spaces. This is particularly problematic in
multiagent POMDPs where the action and observation space grows exponentially
with the number of agents. To combat this intractability, we propose a novel
scalable approach based on sample-based planning and factored value functions
that exploits structure present in many multiagent settings. This approach
applies not only in the planning case, but also in the Bayesian reinforcement
learning setting. Experimental results show that we are able to provide high
quality solutions to large multiagent planning and learning problems
Multi-Objective Multi-Agent Planning for Jointly Discovering and Tracking Mobile Object
We consider the challenging problem of online planning for a team of agents
to autonomously search and track a time-varying number of mobile objects under
the practical constraint of detection range limited onboard sensors. A standard
POMDP with a value function that either encourages discovery or accurate
tracking of mobile objects is inadequate to simultaneously meet the conflicting
goals of searching for undiscovered mobile objects whilst keeping track of
discovered objects. The planning problem is further complicated by
misdetections or false detections of objects caused by range limited sensors
and noise inherent to sensor measurements. We formulate a novel multi-objective
POMDP based on information theoretic criteria, and an online multi-object
tracking filter for the problem. Since controlling multi-agent is a well known
combinatorial optimization problem, assigning control actions to agents
necessitates a greedy algorithm. We prove that our proposed multi-objective
value function is a monotone submodular set function; consequently, the greedy
algorithm can achieve a (1-1/e) approximation for maximizing the submodular
multi-objective function.Comment: Accepted for publication to the Thirty-Fourth AAAI Conference on
Artificial Intelligence (AAAI-20). Added algorithm 1, background on MPOMDP
and OSP
Influence-Optimistic Local Values for Multiagent Planning --- Extended Version
Recent years have seen the development of methods for multiagent planning
under uncertainty that scale to tens or even hundreds of agents. However, most
of these methods either make restrictive assumptions on the problem domain, or
provide approximate solutions without any guarantees on quality. Methods in the
former category typically build on heuristic search using upper bounds on the
value function. Unfortunately, no techniques exist to compute such upper bounds
for problems with non-factored value functions. To allow for meaningful
benchmarking through measurable quality guarantees on a very general class of
problems, this paper introduces a family of influence-optimistic upper bounds
for factored decentralized partially observable Markov decision processes
(Dec-POMDPs) that do not have factored value functions. Intuitively, we derive
bounds on very large multiagent planning problems by subdividing them in
sub-problems, and at each of these sub-problems making optimistic assumptions
with respect to the influence that will be exerted by the rest of the system.
We numerically compare the different upper bounds and demonstrate how we can
achieve a non-trivial guarantee that a heuristic solution for problems with
hundreds of agents is close to optimal. Furthermore, we provide evidence that
the upper bounds may improve the effectiveness of heuristic influence search,
and discuss further potential applications to multiagent planning.Comment: Long version of IJCAI 2015 paper (and extended abstract at AAMAS
2015
A Decoupling Principle for Simultaneous Localization and Planning Under Uncertainty in Multi-Agent Dynamic Environments
Simultaneous localization and planning for nonlinear stochastic systems under
process and measurement uncertainties is a challenging problem. In its most general
form, it is formulated as a stochastic optimal control problem in the space of feedback
policies. The Hamilton-Jacobi-Bellman equation provides the theoretical solution of
the optimal problem; but, as is typical of almost all nonlinear stochastic systems,
optimally solving the problem is intractable. Moreover, even if an optimal solution
was obtained, it would require centralized control, while multi-agent mobile robotic
systems under dynamic environments require decentralized solutions.
In this study, we aim for a theoretically sound solution for various modes of
this problem, including the single-agent and multi-agent variations with perfect and
imperfect state information, where the underlying state, control and observation
spaces are continuous with discrete-time models. We introduce a decoupling principle
for planning and control of multi-agent nonlinear stochastic systems based on a
small noise asymptotics. Through this decoupling principle, under small noise, the
design of the real-time feedback law can be decoupled from the off-line design of the
nominal trajectory of the system. Further, for a multi-agent problem, the design of
the feedback laws for different agents can be decoupled from each other, reducing the
centralized problem to a decentralized problem requiring no communication during
execution. The resulting solution is quantifiably near-optimal.
We establish this result for all the above-mentioned variations, which results in
the following variants: Trajectory-optimized Linear Quadratic Regulator (T-LQR),
Multi-agent T-LQR (MT-LQR), Trajectory-optimized Linear Quadratic Gaussian
(T-LQG), and Multi-agent T-LQG (MT-LQG). The decoupling principle provides the conditions under which a decentralized linear Gaussian system with a quadratic
approximation of the cost, obtained by linearization around an optimally designed
nominal trajectory can be utilized to control the nonlinear system. The resulting decentralized
feedback solution at runtime, being decoupled with respect to the mobile
agents, requires no communication between the agents during the execution phase.
Moreover, the complexity of the solution vis-a-vis the computation of the nominal
trajectory as well as the closed-loop gains is tractable with low polynomial orders of
computation. Experimental implementation of the solution shows that the results
hold for moderate levels of noise with high probability.
Further optimizing the performance of this approach we show how to design a
special cost function for the problem with imperfect state measurement that takes
advantage of the fact that the estimation covariance of a linear Gaussian system is
deterministic and not dependent on the observations. This design, which corresponds
in our overall design to “belief space planning”, incorporates the consequently deterministic
cost of the stochastic feedback system into the deterministic design of the
nominal trajectory to obtain an optimal nominal trajectory with the best estimation
performance. Then, it utilizes the T-LQG approach to design an optimal feedback
law to track the designed nominal trajectory. This iterative approach can be used to
further tune both the open loop as well as the decentralized feedback gain portions
of the overall design. We also provide the multi-agent variant of this approach based
on the MT-LQG method.
Based on the near-optimality guarantees of the decoupling principle and the TLQG
approach, we analyze the performance and correctness of a well-known heuristic
in robotic path planning. We show that optimizing measures of the observability
Gramian as a surrogate for estimation performance may provide irrelevant or misleading
trajectories for planning under observation uncertainty.
We then consider systems with non-Gaussian perturbations. An alternative
heuristic method is proposed that aims for fast planning in belief space under non-
Gaussian uncertainty. We provide a special design approach based on particle filters
that results in a convex planning problem implemented via a model predictive control
strategy in convex environments, and a locally convex problem in non-convex environments.
The environment here refers to the complement of the region in Euclidean
space that contains the obstacles or “no fly zones”.
For non-convex dynamic environments, where the no-go regions change dynamically
with time, we design a special form of an obstacle penalty function that incorporates
non-convex time-varying constraints into the cost function, so that the
decoupling principle still applies to these problems. However, similar to any constrained
problem, the quality of the optimal nominal trajectory is dependent on the
quality of the solution obtainable for the nonlinear optimization problem.
We simulate our algorithms for each of the problems on various challenging situations,
including for several nonlinear robotic models and common measurement
models. In particular, we consider 2D and 3D dynamic environments for heterogeneous
holonomic and non-holonomic robots, and range and bearing sensing models.
Future research can potentially extend the results to more general situations including
continuous-time models