3,043 research outputs found
Q-CP: Learning Action Values for Cooperative Planning
Research on multi-robot systems has demonstrated promising results in manifold applications and domains. Still, efficiently learning an effective robot behaviors is very difficult, due to unstructured scenarios, high uncertainties, and large state dimensionality (e.g. hyper-redundant and groups of robot). To alleviate this problem, we present Q-CP a cooperative model-based reinforcement learning algorithm, which exploits action values to both (1) guide the exploration of the state space and (2) generate effective policies. Specifically, we exploit Q-learning to attack the curse-of-dimensionality in the iterations of a Monte-Carlo Tree Search. We implement and evaluate Q-CP on different stochastic cooperative (general-sum) games: (1) a simple cooperative navigation problem among 3 robots, (2) a cooperation scenario between a pair of KUKA YouBots performing hand-overs, and (3) a coordination task between two mobile robots entering a door. The obtained results show the effectiveness of Q-CP in the chosen applications, where action values drive the exploration and reduce the computational demand of the planning process while achieving good performance
Tasks for Agent-Based Negotiation Teams:Analysis, Review, and Challenges
An agent-based negotiation team is a group of interdependent agents that join
together as a single negotiation party due to their shared interests in the
negotiation at hand. The reasons to employ an agent-based negotiation team may
vary: (i) more computation and parallelization capabilities, (ii) unite agents
with different expertise and skills whose joint work makes it possible to
tackle complex negotiation domains, (iii) the necessity to represent different
stakeholders or different preferences in the same party (e.g., organizations,
countries, and married couple). The topic of agent-based negotiation teams has
been recently introduced in multi-agent research. Therefore, it is necessary to
identify good practices, challenges, and related research that may help in
advancing the state-of-the-art in agent-based negotiation teams. For that
reason, in this article we review the tasks to be carried out by agent-based
negotiation teams. Each task is analyzed and related with current advances in
different research areas. The analysis aims to identify special challenges that
may arise due to the particularities of agent-based negotiation teams.Comment: Engineering Applications of Artificial Intelligence, 201
Towards a Better Understanding of Learning with Multiagent Teams
While it has long been recognized that a team of individual learning agents
can be greater than the sum of its parts, recent work has shown that larger
teams are not necessarily more effective than smaller ones. In this paper, we
study why and under which conditions certain team structures promote effective
learning for a population of individual learning agents. We show that,
depending on the environment, some team structures help agents learn to
specialize into specific roles, resulting in more favorable global results.
However, large teams create credit assignment challenges that reduce
coordination, leading to large teams performing poorly compared to smaller
ones. We support our conclusions with both theoretical analysis and empirical
results.Comment: 15 pages, 11 figures, published at the International Joint Conference
on Artificial Intelligence (IJCAI) in 202
Learning for Multi-robot Cooperation in Partially Observable Stochastic Environments with Macro-actions
This paper presents a data-driven approach for multi-robot coordination in
partially-observable domains based on Decentralized Partially Observable Markov
Decision Processes (Dec-POMDPs) and macro-actions (MAs). Dec-POMDPs provide a
general framework for cooperative sequential decision making under uncertainty
and MAs allow temporally extended and asynchronous action execution. To date,
most methods assume the underlying Dec-POMDP model is known a priori or a full
simulator is available during planning time. Previous methods which aim to
address these issues suffer from local optimality and sensitivity to initial
conditions. Additionally, few hardware demonstrations involving a large team of
heterogeneous robots and with long planning horizons exist. This work addresses
these gaps by proposing an iterative sampling based Expectation-Maximization
algorithm (iSEM) to learn polices using only trajectory data containing
observations, MAs, and rewards. Our experiments show the algorithm is able to
achieve better solution quality than the state-of-the-art learning-based
methods. We implement two variants of multi-robot Search and Rescue (SAR)
domains (with and without obstacles) on hardware to demonstrate the learned
policies can effectively control a team of distributed robots to cooperate in a
partially observable stochastic environment.Comment: Accepted to the 2017 IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS 2017
Can bounded and self-interested agents be teammates? Application to planning in ad hoc teams
Planning for ad hoc teamwork is challenging because it involves agents collaborating without any prior coordination or communication. The focus is on principled methods for a single agent to cooperate with others. This motivates investigating the ad hoc teamwork problem in the context of self-interested decision-making frameworks. Agents engaged in individual decision making in multiagent settings face the task of having to reason about other agents’ actions, which may in turn involve reasoning about others. An established approximation that operationalizes this approach is to bound the infinite nesting from below by introducing level 0 models. For the purposes of this study, individual, self-interested decision making in multiagent settings is modeled using interactive dynamic influence diagrams (I-DID). These are graphical models with the benefit that they naturally offer a factored representation of the problem, allowing agents to ascribe dynamic models to others and reason about them. We demonstrate that an implication of bounded, finitely-nested reasoning by a self-interested agent is that we may not obtain optimal team solutions in cooperative settings, if it is part of a team. We address this limitation by including models at level 0 whose solutions involve reinforcement learning. We show how the learning is integrated into planning in the context of I-DIDs. This facilitates optimal teammate behavior, and we demonstrate its applicability to ad hoc teamwork on several problem domains and configurations
- …