345 research outputs found
An Auction-based Coordination Strategy for Task-Constrained Multi-Agent Stochastic Planning with Submodular Rewards
In many domains such as transportation and logistics, search and rescue, or
cooperative surveillance, tasks are pending to be allocated with the
consideration of possible execution uncertainties. Existing task coordination
algorithms either ignore the stochastic process or suffer from the
computational intensity. Taking advantage of the weakly coupled feature of the
problem and the opportunity for coordination in advance, we propose a
decentralized auction-based coordination strategy using a newly formulated
score function which is generated by forming the problem into task-constrained
Markov decision processes (MDPs). The proposed method guarantees convergence
and at least 50% optimality in the premise of a submodular reward function.
Furthermore, for the implementation on large-scale applications, an approximate
variant of the proposed method, namely Deep Auction, is also suggested with the
use of neural networks, which is evasive of the troublesome for constructing
MDPs. Inspired by the well-known actor-critic architecture, two Transformers
are used to map observations to action probabilities and cumulative rewards
respectively. Finally, we demonstrate the performance of the two proposed
approaches in the context of drone deliveries, where the stochastic planning
for the drone league is cast into a stochastic price-collecting Vehicle Routing
Problem (VRP) with time windows. Simulation results are compared with
state-of-the-art methods in terms of solution quality, planning efficiency and
scalability.Comment: 17 pages, 5 figure
Modeling Information Exchange Opportunities for Effective Human-Computer Teamwork
This paper studies information exchange in collaborative group activities involving mixed networks of people and computer agents. It introduces the concept of "nearly decomposable" decision-making problems to address the complexity of information exchange decisions in such multi-agent settings. This class of decision-making problems arise in settings which have an action structure that requires agents to reason about only a subset of their partners' actions – but otherwise allows them to act independently. The paper presents a formal model of nearly decomposable decision-making problems, NED-MDPs, and defines an approximation algorithm, NED-DECOP that computes efficient information exchange strategies. The paper shows that NED-DECOP is more efficient than prior collaborative planning algorithms for this class of problem. It presents an empirical study of the information exchange decisions made by the algorithm that investigates the extent to which people accept interruption requests from a computer agent. The context for the study is a game in which the agent can ask people for information that may benefit its individual performance and thus the groupʼs collaboration. This study revealed the key factors affecting peopleʼs perception of the benefit of interruptions in this setting. The paper also describes the use of machine learning to predict the situations in which people deviate from the strategies generated by the algorithm, using a combination of domain features and features informed by the algorithm. The methodology followed in this work could form the basis for designing agents that effectively exchange information in collaborations with people.Engineering and Applied Science
RODE: Learning Roles to Decompose Multi-Agent Tasks
Role-based learning holds the promise of achieving scalable multi-agent
learning by decomposing complex tasks using roles. However, it is largely
unclear how to efficiently discover such a set of roles. To solve this problem,
we propose to first decompose joint action spaces into restricted role action
spaces by clustering actions according to their effects on the environment and
other agents. Learning a role selector based on action effects makes role
discovery much easier because it forms a bi-level learning hierarchy -- the
role selector searches in a smaller role space and at a lower temporal
resolution, while role policies learn in significantly reduced primitive
action-observation spaces. We further integrate information about action
effects into the role policies to boost learning efficiency and policy
generalization. By virtue of these advances, our method (1) outperforms the
current state-of-the-art MARL algorithms on 10 of the 14 scenarios that
comprise the challenging StarCraft II micromanagement benchmark and (2)
achieves rapid transfer to new environments with three times the number of
agents. Demonstrative videos are available at
https://sites.google.com/view/rode-marl
Coordinating decentralized learning and conflict resolution across agent boundaries
It is crucial for embedded systems to adapt to the dynamics of open environments. This adaptation process becomes especially challenging in the context of multiagent systems because of scalability, partial information accessibility and complex interaction of agents. It is a challenge for agents to learn good policies, when they need to plan and coordinate in uncertain, dynamic environments, especially when they have large state spaces. It is also critical for agents operating in a multiagent system (MAS) to resolve conflicts among the learned policies of different agents, since such conflicts may have detrimental influence on the overall performance.
The focus of this research is to use a reinforcement learning based local optimization algorithm within each agent to learn multiagent policies in a decentralized fashion. These policies will allow each agent to adapt to changes in environmental conditions while reorganizing the underlying multiagent network when needed. The research takes an adaptive approach to resolving conflicts that can arise between locally optimal agent policies. First an algorithm that uses heuristic rules to locally resolve simple conflicts is presented. When the environment is more dynamic and uncertain, a mediator-based mechanism to resolve more complicated conflicts and selectively expand the agents' state space during the learning process is harnessed. For scenarios where mediator-based mechanisms with partially global views are ineffective, a more rigorous approach for global conflict resolution that synthesizes multiagent reinforcement learning (MARL) and distributed constraint optimization (DCOP) is developed. These mechanisms are evaluated in the context of a multiagent tornado tracking application called NetRads. Empirical results show that these mechanisms significantly improve the performance of the tornado tracking network for a variety of weather scenarios.
The major contributions of this work are: a state of the art decentralized learning approach that supports agent interactions and reorganizes the underlying network when needed; the use of abstract classes of scenarios/states/actions that efficiently manages the exploration of the search space; novel conflict resolution algorithms of increasing complexity that use heuristic rules, sophisticated automated negotiation mechanisms and distributed constraint optimization methods respectively; and finally, a rigorous study of the interplay between two popular theories used to solve multiagent problems, namely decentralized Markov decision processes and distributed constraint optimization
Recommended from our members
Reasoning effectively under uncertainty for human-computer teamwork
As people are increasingly connected to other people and computer agents, forming mixed networks, collaborative teamwork offers great promise for transforming the way people perform their everyday activities and interact with computer agents. This thesis presents new representations and algorithms, developed to enable computer systems to function as effective team members in settings characterized by uncertainty and partial information.
For a collaboration to succeed in such settings, participants need to reason about the possible plans of others, to be able to adapt their plans as needed for coordination, and to support each other's activities. Reasoning on general teamwork models accordingly requires compact representations and efficient decision-theoretic mechanisms. This thesis presents Probabilistic Recipe Trees, a probabilistic representation of agents' beliefs about the probable plans of others, and decision-theoretic mechanisms that use this representation to manage helpful behavior by considering the costs and utilities of computer agents and people participating in collaborative activities. These mechanisms are shown to outperform axiomatic approaches in empirical studies.
The thesis also addresses the challenge that agents participating in a collaborative activity need efficient decision-making algorithms for evaluating the effects of their actions on the collaboration, and they need to reason about the way other participants perceive these actions. This thesis identifies structural characteristics of settings in which computer agents and people collaborate and presents decentralized decision-making algorithms that exploit this structure to achieve up to exponential savings in computation time. Empirical studies with human subjects establish that the utility values computed by this algorithm are a good indicator of human behavior, but learning can help to better understand the way these values are perceived by people.
To demonstrate the usefulness of these teamwork capabilities, the thesis describes an application of collaborative teamwork ideas to a real-world setting of ridesharing. The computational model developed for forming collaborative rideshare plans addresses the challenge of guiding self-interested people to collaboration in a dynamic setting. The empirical evaluation of the application on data collected from the real-world demonstrates the value of collaboration for individual users and environment
- …