1,349 research outputs found
Planning for Decentralized Control of Multiple Robots Under Uncertainty
We describe a probabilistic framework for synthesizing control policies for
general multi-robot systems, given environment and sensor models and a cost
function. Decentralized, partially observable Markov decision processes
(Dec-POMDPs) are a general model of decision processes where a team of agents
must cooperate to optimize some objective (specified by a shared reward or cost
function) in the presence of uncertainty, but where communication limitations
mean that the agents cannot share their state, so execution must proceed in a
decentralized fashion. While Dec-POMDPs are typically intractable to solve for
real-world problems, recent research on the use of macro-actions in Dec-POMDPs
has significantly increased the size of problem that can be practically solved
as a Dec-POMDP. We describe this general model, and show how, in contrast to
most existing methods that are specialized to a particular problem class, it
can synthesize control policies that use whatever opportunities for
coordination are present in the problem, while balancing off uncertainty in
outcomes, sensor information, and information about other agents. We use three
variations on a warehouse task to show that a single planner of this type can
generate cooperative behavior using task allocation, direct communication, and
signaling, as appropriate
OperA/ALIVE/OperettA
Comprehensive models for organizations must, on the one hand, be able to specify global goals and requirements but, on the other hand, cannot assume that particular actors will always act according to the needs and expectations of the system design. Concepts as organizational rules (Zambonelli 2002), norms and institutions (Dignum and Dignum 2001; Esteva et al. 2002), and social structures (Parunak and Odell 2002) arise from the idea that the effective engineering of organizations needs high-level, actor-independent concepts and abstractions that explicitly define the organization in which agents live (Zambonelli 2002).Peer ReviewedPostprint (author's final draft
Scalable Planning and Learning for Multiagent POMDPs: Extended Version
Online, sample-based planning algorithms for POMDPs have shown great promise
in scaling to problems with large state spaces, but they become intractable for
large action and observation spaces. This is particularly problematic in
multiagent POMDPs where the action and observation space grows exponentially
with the number of agents. To combat this intractability, we propose a novel
scalable approach based on sample-based planning and factored value functions
that exploits structure present in many multiagent settings. This approach
applies not only in the planning case, but also in the Bayesian reinforcement
learning setting. Experimental results show that we are able to provide high
quality solutions to large multiagent planning and learning problems
On the Utility of Learning about Humans for Human-AI Coordination
While we would like agents that can coordinate with humans, current
algorithms such as self-play and population-based training create agents that
can coordinate with themselves. Agents that assume their partner to be optimal
or similar to them can converge to coordination protocols that fail to
understand and be understood by humans. To demonstrate this, we introduce a
simple environment that requires challenging coordination, based on the popular
game Overcooked, and learn a simple model that mimics human play. We evaluate
the performance of agents trained via self-play and population-based training.
These agents perform very well when paired with themselves, but when paired
with our human model, they are significantly worse than agents designed to play
with the human model. An experiment with a planning algorithm yields the same
conclusion, though only when the human-aware planner is given the exact human
model that it is playing with. A user study with real humans shows this pattern
as well, though less strongly. Qualitatively, we find that the gains come from
having the agent adapt to the human's gameplay. Given this result, we suggest
several approaches for designing agents that learn about humans in order to
better coordinate with them. Code is available at
https://github.com/HumanCompatibleAI/overcooked_ai.Comment: Published at NeurIPS 2019
(http://papers.nips.cc/paper/8760-on-the-utility-of-learning-about-humans-for-human-ai-coordination
Decentralized dynamic task allocation for UAVs with limited communication range
We present the Limited-range Online Routing Problem (LORP), which involves a
team of Unmanned Aerial Vehicles (UAVs) with limited communication range that
must autonomously coordinate to service task requests. We first show a general
approach to cast this dynamic problem as a sequence of decentralized task
allocation problems. Then we present two solutions both based on modeling the
allocation task as a Markov Random Field to subsequently assess decisions by
means of the decentralized Max-Sum algorithm. Our first solution assumes
independence between requests, whereas our second solution also considers the
UAVs' workloads. A thorough empirical evaluation shows that our workload-based
solution consistently outperforms current state-of-the-art methods in a wide
range of scenarios, lowering the average service time up to 16%. In the
best-case scenario there is no gap between our decentralized solution and
centralized techniques. In the worst-case scenario we manage to reduce by 25%
the gap between current decentralized and centralized techniques. Thus, our
solution becomes the method of choice for our problem
- …