311 research outputs found
Near-Optimal Adversarial Policy Switching for Decentralized Asynchronous Multi-Agent Systems
A key challenge in multi-robot and multi-agent systems is generating
solutions that are robust to other self-interested or even adversarial parties
who actively try to prevent the agents from achieving their goals. The
practicality of existing works addressing this challenge is limited to only
small-scale synchronous decision-making scenarios or a single agent planning
its best response against a single adversary with fixed, procedurally
characterized strategies. In contrast this paper considers a more realistic
class of problems where a team of asynchronous agents with limited observation
and communication capabilities need to compete against multiple strategic
adversaries with changing strategies. This problem necessitates agents that can
coordinate to detect changes in adversary strategies and plan the best response
accordingly. Our approach first optimizes a set of stratagems that represent
these best responses. These optimized stratagems are then integrated into a
unified policy that can detect and respond when the adversaries change their
strategies. The near-optimality of the proposed framework is established
theoretically as well as demonstrated empirically in simulation and hardware
Can bounded and self-interested agents be teammates? Application to planning in ad hoc teams
Planning for ad hoc teamwork is challenging because it involves agents collaborating without any prior coordination or communication. The focus is on principled methods for a single agent to cooperate with others. This motivates investigating the ad hoc teamwork problem in the context of self-interested decision-making frameworks. Agents engaged in individual decision making in multiagent settings face the task of having to reason about other agents’ actions, which may in turn involve reasoning about others. An established approximation that operationalizes this approach is to bound the infinite nesting from below by introducing level 0 models. For the purposes of this study, individual, self-interested decision making in multiagent settings is modeled using interactive dynamic influence diagrams (I-DID). These are graphical models with the benefit that they naturally offer a factored representation of the problem, allowing agents to ascribe dynamic models to others and reason about them. We demonstrate that an implication of bounded, finitely-nested reasoning by a self-interested agent is that we may not obtain optimal team solutions in cooperative settings, if it is part of a team. We address this limitation by including models at level 0 whose solutions involve reinforcement learning. We show how the learning is integrated into planning in the context of I-DIDs. This facilitates optimal teammate behavior, and we demonstrate its applicability to ad hoc teamwork on several problem domains and configurations
Memory Bounded Open-Loop Planning in Large POMDPs using Thompson Sampling
State-of-the-art approaches to partially observable planning like POMCP are
based on stochastic tree search. While these approaches are computationally
efficient, they may still construct search trees of considerable size, which
could limit the performance due to restricted memory resources. In this paper,
we propose Partially Observable Stacked Thompson Sampling (POSTS), a memory
bounded approach to open-loop planning in large POMDPs, which optimizes a fixed
size stack of Thompson Sampling bandits. We empirically evaluate POSTS in four
large benchmark problems and compare its performance with different tree-based
approaches. We show that POSTS achieves competitive performance compared to
tree-based open-loop planning and offers a performance-memory tradeoff, making
it suitable for partially observable planning with highly restricted
computational and memory resources.Comment: Presented at AAAI 201
Individual Planning in Agent Populations: Exploiting Anonymity and Frame-Action Hypergraphs
Interactive partially observable Markov decision processes (I-POMDP) provide
a formal framework for planning for a self-interested agent in multiagent
settings. An agent operating in a multiagent environment must deliberate about
the actions that other agents may take and the effect these actions have on the
environment and the rewards it receives. Traditional I-POMDPs model this
dependence on the actions of other agents using joint action and model spaces.
Therefore, the solution complexity grows exponentially with the number of
agents thereby complicating scalability. In this paper, we model and extend
anonymity and context-specific independence -- problem structures often present
in agent populations -- for computational gain. We empirically demonstrate the
efficiency from exploiting these problem structures by solving a new multiagent
problem involving more than 1,000 agents.Comment: 8 page article plus two page appendix containing proofs in
Proceedings of 25th International Conference on Autonomous Planning and
Scheduling, 201
Iterative Online Planning in Multiagent Settings with Limited Model Spaces and PAC Guarantees
Methods for planning in multiagent settings often model other agents ’ possible behaviors. However, the space of these models – whether these are policy trees, finite-state controllers or inten-tional models – is very large and thus arbitrarily bounded. This may exclude the true model or the optimal model. In this paper, we present a novel iterative algorithm for online planning that consid-ers a limited model space, updates it dynamically using data from interactions, and provides a provable and probabilistic bound on the approximation error. We ground this approach in the context of graphical models for planning in partially observable multiagent settings – interactive dynamic influence diagrams. We empirically demonstrate that the limited model space facilitates fast solutions and that the true model often enters the limited model space
A Framework for Sequential Planning in Multi-Agent Settings
This paper extends the framework of partially observable Markov decision
processes (POMDPs) to multi-agent settings by incorporating the notion of agent
models into the state space. Agents maintain beliefs over physical states of
the environment and over models of other agents, and they use Bayesian updates
to maintain their beliefs over time. The solutions map belief states to
actions. Models of other agents may include their belief states and are related
to agent types considered in games of incomplete information. We express the
agents autonomy by postulating that their models are not directly manipulable
or observable by other agents. We show that important properties of POMDPs,
such as convergence of value iteration, the rate of convergence, and piece-wise
linearity and convexity of the value functions carry over to our framework. Our
approach complements a more traditional approach to interactive settings which
uses Nash equilibria as a solution paradigm. We seek to avoid some of the
drawbacks of equilibria which may be non-unique and do not capture
off-equilibrium behaviors. We do so at the cost of having to represent, process
and continuously revise models of other agents. Since the agents beliefs may be
arbitrarily nested, the optimal solutions to decision making problems are only
asymptotically computable. However, approximate belief updates and
approximately optimal plans are computable. We illustrate our framework using a
simple application domain, and we show examples of belief updates and value
functions
Structure in the Value Function of Two-Player Zero-Sum Games of Incomplete Information
Zero-sum stochastic games provide a rich model for competitive decision
making. However, under general forms of state uncertainty as considered in the
Partially Observable Stochastic Game (POSG), such decision making problems are
still not very well understood. This paper makes a contribution to the theory
of zero-sum POSGs by characterizing structure in their value function. In
particular, we introduce a new formulation of the value function for zs-POSGs
as a function of the "plan-time sufficient statistics" (roughly speaking the
information distribution in the POSG), which has the potential to enable
generalization over such information distributions. We further delineate this
generalization capability by proving a structural result on the shape of value
function: it exhibits concavity and convexity with respect to appropriately
chosen marginals of the statistic space. This result is a key pre-cursor for
developing solution methods that may be able to exploit such structure.
Finally, we show how these results allow us to reduce a zs-POSG to a
"centralized" model with shared observations, thereby transferring results for
the latter, narrower class, to games with individual (private) observations
Decentralized Cooperative Planning for Automated Vehicles with Hierarchical Monte Carlo Tree Search
Today's automated vehicles lack the ability to cooperate implicitly with
others. This work presents a Monte Carlo Tree Search (MCTS) based approach for
decentralized cooperative planning using macro-actions for automated vehicles
in heterogeneous environments. Based on cooperative modeling of other agents
and Decoupled-UCT (a variant of MCTS), the algorithm evaluates the
state-action-values of each agent in a cooperative and decentralized manner,
explicitly modeling the interdependence of actions between traffic
participants. Macro-actions allow for temporal extension over multiple time
steps and increase the effective search depth requiring fewer iterations to
plan over longer horizons. Without predefined policies for macro-actions, the
algorithm simultaneously learns policies over and within macro-actions. The
proposed method is evaluated under several conflict scenarios, showing that the
algorithm can achieve effective cooperative planning with learned macro-actions
in heterogeneous environments
- …