13 research outputs found
Influence-Optimistic Local Values for Multiagent Planning --- Extended Version
Recent years have seen the development of methods for multiagent planning
under uncertainty that scale to tens or even hundreds of agents. However, most
of these methods either make restrictive assumptions on the problem domain, or
provide approximate solutions without any guarantees on quality. Methods in the
former category typically build on heuristic search using upper bounds on the
value function. Unfortunately, no techniques exist to compute such upper bounds
for problems with non-factored value functions. To allow for meaningful
benchmarking through measurable quality guarantees on a very general class of
problems, this paper introduces a family of influence-optimistic upper bounds
for factored decentralized partially observable Markov decision processes
(Dec-POMDPs) that do not have factored value functions. Intuitively, we derive
bounds on very large multiagent planning problems by subdividing them in
sub-problems, and at each of these sub-problems making optimistic assumptions
with respect to the influence that will be exerted by the rest of the system.
We numerically compare the different upper bounds and demonstrate how we can
achieve a non-trivial guarantee that a heuristic solution for problems with
hundreds of agents is close to optimal. Furthermore, we provide evidence that
the upper bounds may improve the effectiveness of heuristic influence search,
and discuss further potential applications to multiagent planning.Comment: Long version of IJCAI 2015 paper (and extended abstract at AAMAS
2015
Influence-Based Abstraction for Multiagent Systems
This paper presents a theoretical advance by which factored POSGs can be decomposed into local models. We formalize the interface between such local models as the influence agents can exert on one another; and we prove that this interface is sufficient for decoupling them. The resulting influence-based abstraction substantially generalizes previous work on exploiting weakly-coupled agent interaction structures. Therein lie several important contributions. First, our general formulation sheds new light on the theoretical relationships among previous approaches, and promotes future empirical comparisons that could come by extending them beyond the more specific problem contexts for which they were developed. More importantly, the influence-based approaches that we generalize have shown promising improvements in the scalability of planning for more restrictive models. Thus, our theoretical result here serves as the foundation for practical algorithms that we anticipate will bring similar improvements to more general planning contexts, and also into other domains such as approximate planning, decision-making in adversarial domains, and online learning.United States. Air Force Office of Scientific Research. Multidisciplinary University Research Initiative (Project FA9550-09-1-0538
Experience Filter: Using Past Experiences on Unseen Tasks or Environments
One of the bottlenecks of training autonomous vehicle (AV) agents is the
variability of training environments. Since learning optimal policies for
unseen environments is often very costly and requires substantial data
collection, it becomes computationally intractable to train the agent on every
possible environment or task the AV may encounter. This paper introduces a
zero-shot filtering approach to interpolate learned policies of past
experiences to generalize to unseen ones. We use an experience kernel to
correlate environments. These correlations are then exploited to produce
policies for new tasks or environments from learned policies. We demonstrate
our methods on an autonomous vehicle driving through T-intersections with
different characteristics, where its behavior is modeled as a partially
observable Markov decision process (POMDP). We first construct compact
representations of learned policies for POMDPs with unknown transition
functions given a dataset of sequential actions and observations. Then, we
filter parameterized policies of previously visited environments to generate
policies to new, unseen environments. We demonstrate our approaches on both an
actual AV and a high-fidelity simulator. Results indicate that our experience
filter offers a fast, low-effort, and near-optimal solution to create policies
for tasks or environments never seen before. Furthermore, the generated new
policies outperform the policy learned using the entire data collected from
past environments, suggesting that the correlation among different environments
can be exploited and irrelevant ones can be filtered out.Comment: Accepted at IEEE Intelligent Vehicles Symposium (IV) 202
Abstracting Influences for Efficient Multiagent Coordination Under Uncertainty.
When planning optimal decisions for teams of agents acting in uncertain domains, conventional methods explicitly coordinate all joint policy decisions and, in doing so, are inherently susceptible to the curse of dimensionality, as state, action, and observation spaces grow exponentially with the number of agents. With the goal of extending the scalability of optimal team coordination, the research presented in this dissertation examines how agents can reduce the amount of information they need to coordinate. Intuitively, to the extent that agents are weakly coupled, they can avoid the complexity of coordinating all decisions; they need instead only coordinate abstractions of their policies that convey their essential influences on each other.
In formalizing this intuition, I consider several complementary aspects of weakly-coupled problem structure, including agent scope size, corresponding to the number of an agent's peers whose decisions influence the agent's decisions, and degree of influence, corresponding to the proportion of unique influences that peers can feasibly exert. To exploit this structure, I introduce a (transition-dependent decentralized POMDP) model that efficiently decomposes into local decision models with shared state features. This context yields a novel characterization of influences as transition probabilities (compactly encoded using a dynamic Bayesian network). Not only is this influence representation provably sufficient for optimal coordination, but it also allows me to frame the subproblems of (1) proposing influences, (2) evaluating influences, and (3) computing optimal policies around influences as mixed-integer linear programs.
The primary advantage of working in the influence space is that there are potentially significantly fewer feasible influences than there are policies. Blending prior work on decoupled joint policy search and constraint optimization, I develop influence-space search algorithms that, for problems with a low degree of influence, compute optimal solutions orders of magnitude faster than policy-space search. When agents' influences are constrained, influence-space search also outperforms other state-of-the-art optimal solution algorithms. Moreover, by exploiting both degree of influence and agent scope size, I demonstrate scalability, substantially beyond the reach of prior optimal methods, to teams of 50 weakly-coupled transition-dependent agents.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/84614/1/witwicki_1.pd
Heuristic Search of Multiagent Influence Space
Multiagent planning under uncertainty has seen important progress in recent years. Two techniques, in particular, have substantially advanced efficiency and scalability of planning. Multiagent heuristic search gains traction by pruning large portions of the joint policy space deemed suboptimal by heuristic bounds. Alternatively, influence-based abstraction reformulates the search space of joint policies into a smaller space of influences, which represent the probabilistic effects that agents' policies may exert on one another. These techniques have been used independently, but never together, to solve larger problems (for Dec-POMDPs and subclasses) than previously possible. In this paper, we take the logical albeit nontrivial next step of combining multiagent A* search and influence-based abstraction into a single algorithm. The mathematical foundation that we provide, such as partially-specified influence evaluation and admissible heuristic definition, enables an investigation into whether the two techniques bring complementary gains. Our empirical results indicate that A* can provide significant computational savings on top of those already afforded by influence-space search, thereby bringing a significant contribution to the field of multiagent planning under uncertainty.Fundacao para a Ciencia e a TecnologiaCarnegie Mellon Portugal Program (Project CMU-PT/SIA/0023/2009)United States. Air Force Office of Scientific Research. Multidisciplinary University Research Initiative (Project FA9550-09-1-0538)NWO of the Netherlands (CATCH Project 640.005.003