207 research outputs found
Near-Optimal Adversarial Policy Switching for Decentralized Asynchronous Multi-Agent Systems
A key challenge in multi-robot and multi-agent systems is generating
solutions that are robust to other self-interested or even adversarial parties
who actively try to prevent the agents from achieving their goals. The
practicality of existing works addressing this challenge is limited to only
small-scale synchronous decision-making scenarios or a single agent planning
its best response against a single adversary with fixed, procedurally
characterized strategies. In contrast this paper considers a more realistic
class of problems where a team of asynchronous agents with limited observation
and communication capabilities need to compete against multiple strategic
adversaries with changing strategies. This problem necessitates agents that can
coordinate to detect changes in adversary strategies and plan the best response
accordingly. Our approach first optimizes a set of stratagems that represent
these best responses. These optimized stratagems are then integrated into a
unified policy that can detect and respond when the adversaries change their
strategies. The near-optimality of the proposed framework is established
theoretically as well as demonstrated empirically in simulation and hardware
Best Response Bayesian Reinforcement Learning for Multiagent Systems with State Uncertainty
It is often assumed that agents in multiagent systems with state uncertainty have full knowledge of the model of dy- namics and sensors, but in many cases this is not feasible. A more realistic assumption is that agents must learn about the environment and other agents while acting. Bayesian methods for reinforcement learning are promising for this type of learning because they allow model uncertainty to be considered explicitly and offer a principled way of dealing with the exploration/exploitation tradeoff. In this paper, we propose a Bayesian RL framework for best response learn- ing in which an agent has uncertainty over the environment and the policies of the other agents. This is a very general model that can incorporate different assumptions about the form of other policies. We seek to maximize performance and learn the appropriate models while acting in an online fashion by using sample-based planning built from power- ful Monte-Carlo tree search methods. We discuss the theo- retical properties of this approach and experimental results show that the learning approaches can significantly increase value when compared to initial models and policies
Hidden Parameter Markov Decision Processes: A Semiparametric Regression Approach for Discovering Latent Task Parametrizations
Control applications often feature tasks with similar, but not identical,
dynamics. We introduce the Hidden Parameter Markov Decision Process (HiP-MDP),
a framework that parametrizes a family of related dynamical systems with a
low-dimensional set of latent factors, and introduce a semiparametric
regression approach for learning its structure from data. In the control
setting, we show that a learned HiP-MDP rapidly identifies the dynamics of a
new task instance, allowing an agent to flexibly adapt to task variations
Individual Planning in Agent Populations: Exploiting Anonymity and Frame-Action Hypergraphs
Interactive partially observable Markov decision processes (I-POMDP) provide
a formal framework for planning for a self-interested agent in multiagent
settings. An agent operating in a multiagent environment must deliberate about
the actions that other agents may take and the effect these actions have on the
environment and the rewards it receives. Traditional I-POMDPs model this
dependence on the actions of other agents using joint action and model spaces.
Therefore, the solution complexity grows exponentially with the number of
agents thereby complicating scalability. In this paper, we model and extend
anonymity and context-specific independence -- problem structures often present
in agent populations -- for computational gain. We empirically demonstrate the
efficiency from exploiting these problem structures by solving a new multiagent
problem involving more than 1,000 agents.Comment: 8 page article plus two page appendix containing proofs in
Proceedings of 25th International Conference on Autonomous Planning and
Scheduling, 201
Can bounded and self-interested agents be teammates? Application to planning in ad hoc teams
Planning for ad hoc teamwork is challenging because it involves agents collaborating without any prior coordination or communication. The focus is on principled methods for a single agent to cooperate with others. This motivates investigating the ad hoc teamwork problem in the context of self-interested decision-making frameworks. Agents engaged in individual decision making in multiagent settings face the task of having to reason about other agents’ actions, which may in turn involve reasoning about others. An established approximation that operationalizes this approach is to bound the infinite nesting from below by introducing level 0 models. For the purposes of this study, individual, self-interested decision making in multiagent settings is modeled using interactive dynamic influence diagrams (I-DID). These are graphical models with the benefit that they naturally offer a factored representation of the problem, allowing agents to ascribe dynamic models to others and reason about them. We demonstrate that an implication of bounded, finitely-nested reasoning by a self-interested agent is that we may not obtain optimal team solutions in cooperative settings, if it is part of a team. We address this limitation by including models at level 0 whose solutions involve reinforcement learning. We show how the learning is integrated into planning in the context of I-DIDs. This facilitates optimal teammate behavior, and we demonstrate its applicability to ad hoc teamwork on several problem domains and configurations
- …