471 research outputs found
DOP: Deep Optimistic Planning with Approximate Value Function Evaluation
Research on reinforcement learning has demonstrated promising results in
manifold applications and domains. Still, efficiently learning effective robot
behaviors is very difficult, due to unstructured scenarios, high uncertainties,
and large state dimensionality (e.g. multi-agent systems or hyper-redundant
robots). To alleviate this problem, we present DOP, a deep model-based
reinforcement learning algorithm, which exploits action values to both (1)
guide the exploration of the state space and (2) plan effective policies.
Specifically, we exploit deep neural networks to learn Q-functions that are
used to attack the curse of dimensionality during a Monte-Carlo tree search.
Our algorithm, in fact, constructs upper confidence bounds on the learned value
function to select actions optimistically. We implement and evaluate DOP on
different scenarios: (1) a cooperative navigation problem, (2) a fetching task
for a 7-DOF KUKA robot, and (3) a human-robot handover with a humanoid robot
(both in simulation and real). The obtained results show the effectiveness of
DOP in the chosen applications, where action values drive the exploration and
reduce the computational demand of the planning process while achieving good
performance.Comment: to appear as an extended abstract paper in the Proc. of the 17th
International Conference on Autonomous Agents and Multiagent Systems (AAMAS
2018), Stockholm, Sweden, July 10-15, 2018, IFAAMAS. arXiv admin note: text
overlap with arXiv:1803.0029
Learning Multi-Agent Navigation from Human Crowd Data
The task of safely steering agents amidst static and dynamic obstacles has many applications in robotics, graphics, and traffic engineering. While decentralized solutions are essential for scalability and robustness, achieving globally efficient motions for the entire system of agents is equally important. In a traditional decentralized setting, each agent relies on an underlying local planning algorithm that takes as input a preferred velocity and the current state of the agent\u27s neighborhood and then computes a new velocity for the next time-step that is collision-free and as close as possible to the preferred one. Typically, each agent promotes a goal-oriented preferred velocity, which can result in myopic behaviors as actions that are locally optimal for one agent is not necessarily optimal for the global system of agents. In this thesis, we explore a human-inspired approach for efficient multi-agent navigation that allows each agent to intelligently adapt its preferred velocity based on feedback from the environment. Using supervised learning, we investigate different egocentric representations of the local conditions that the agents face and train various deep neural network architectures on extensive collections of human trajectory datasets to learn corresponding life-like velocities. During simulation, we use the learned velocities as high-level, preferred velocities signals passed as input to the underlying local planning algorithm of the agents. We evaluate our proposed framework using two state-of-the-art local methods, the ORCA method, and the PowerLaw method. Qualitative and quantitative results on a range of scenarios show that adapting the preferred velocity results in more time- and energy-efficient navigation policies, allowing agents to reach their destinations faster as compared to agents simulated with vanilla ORCA and PowerLaw
"Guess what I'm doing": Extending legibility to sequential decision tasks
In this paper we investigate the notion of legibility in sequential decision
tasks under uncertainty. Previous works that extend legibility to scenarios
beyond robot motion either focus on deterministic settings or are
computationally too expensive. Our proposed approach, dubbed PoL-MDP, is able
to handle uncertainty while remaining computationally tractable. We establish
the advantages of our approach against state-of-the-art approaches in several
simulated scenarios of different complexity. We also showcase the use of our
legible policies as demonstrations for an inverse reinforcement learning agent,
establishing their superiority against the commonly used demonstrations based
on the optimal policy. Finally, we assess the legibility of our computed
policies through a user study where people are asked to infer the goal of a
mobile robot following a legible policy by observing its actions
Asynchronous Multi-Agent Reinforcement Learning for Efficient Real-Time Multi-Robot Cooperative Exploration
We consider the problem of cooperative exploration where multiple robots need
to cooperatively explore an unknown region as fast as possible. Multi-agent
reinforcement learning (MARL) has recently become a trending paradigm for
solving this challenge. However, existing MARL-based methods adopt
action-making steps as the metric for exploration efficiency by assuming all
the agents are acting in a fully synchronous manner: i.e., every single agent
produces an action simultaneously and every single action is executed
instantaneously at each time step. Despite its mathematical simplicity, such a
synchronous MARL formulation can be problematic for real-world robotic
applications. It can be typical that different robots may take slightly
different wall-clock times to accomplish an atomic action or even periodically
get lost due to hardware issues. Simply waiting for every robot being ready for
the next action can be particularly time-inefficient. Therefore, we propose an
asynchronous MARL solution, Asynchronous Coordination Explorer (ACE), to tackle
this real-world challenge. We first extend a classical MARL algorithm,
multi-agent PPO (MAPPO), to the asynchronous setting and additionally apply
action-delay randomization to enforce the learned policy to generalize better
to varying action delays in the real world. Moreover, each navigation agent is
represented as a team-size-invariant CNN-based policy, which greatly benefits
real-robot deployment by handling possible robot lost and allows
bandwidth-efficient intra-agent communication through low-dimensional CNN
features. We first validate our approach in a grid-based scenario. Both
simulation and real-robot results show that ACE reduces over 10% actual
exploration time compared with classical approaches. We also apply our
framework to a high-fidelity visual-based environment, Habitat, achieving 28%
improvement in exploration efficiency.Comment: This paper is accepted by AAMAS 2023. The source code can be found in
https://github.com/yang-xy20/async_mapp
Scalable Multi-Agent Reinforcement Learning through Intelligent Information Aggregation
We consider the problem of multi-agent navigation and collision avoidance
when observations are limited to the local neighborhood of each agent. We
propose InforMARL, a novel architecture for multi-agent reinforcement learning
(MARL) which uses local information intelligently to compute paths for all the
agents in a decentralized manner. Specifically, InforMARL aggregates
information about the local neighborhood of agents for both the actor and the
critic using a graph neural network and can be used in conjunction with any
standard MARL algorithm. We show that (1) in training, InforMARL has better
sample efficiency and performance than baseline approaches, despite using less
information, and (2) in testing, it scales well to environments with arbitrary
numbers of agents and obstacles.Comment: 11 pages, 5 figures, 2 tables, 3 pages appendix, Code:
https://github.com/nsidn98/InforMAR
A Multiagent Deep Reinforcement Learning Approach for Path Planning in Autonomous Surface Vehicles: The Ypacaraí Lake Patrolling Case
Article number 9330612Autonomous surfaces vehicles (ASVs) excel at monitoring and measuring aquatic nutrients
due to their autonomy, mobility, and relatively low cost. When planning paths for such vehicles, the task
of patrolling with multiple agents is usually addressed with heuristics approaches, such as Reinforcement
Learning (RL), because of the complexity and high dimensionality of the problem. Not only do efficient paths
have to be designed, but addressing disturbances in movement or the battery’s performance is mandatory.
For this multiagent patrolling task, the proposed approach is based on a centralized Convolutional Deep
Q-Network, designed with a final independent dense layer for every agent to deal with scalability, with the
hypothesis/assumption that every agent has the same properties and capabilities. For this purpose, a tailored
reward function is created which penalizes illegal actions (such as collisions) and rewards visiting idle
cells (cells that remains unvisited for a long time). A comparison with various multiagent Reinforcement
Learning (MARL) algorithms has been done (Independent Q-Learning, Dueling Q-Network and multiagent
Double Deep Q-Learning) in a case-study scenario like the Ypacaraí lake in Asunción (Paraguay). The
training results in multiagent policy leads to an average improvement of 15% compared to lawn mower
trajectories and a 6% improvement over the IDQL for the case-study considered. When evaluating the
training speed, the proposed approach runs three times faster than the independent algorithm.Ministerio de Ciencia, Innovación y Universidades (España) RTI2018-098964-B-I00Junta de Andalucía(España) PY18-RE000
CoMIX: A Multi-agent Reinforcement Learning Training Architecture for Efficient Decentralized Coordination and Independent Decision Making
Robust coordination skills enable agents to operate cohesively in shared
environments, together towards a common goal and, ideally, individually without
hindering each other's progress. To this end, this paper presents Coordinated
QMIX (CoMIX), a novel training framework for decentralized agents that enables
emergent coordination through flexible policies, allowing at the same time
independent decision-making at individual level. CoMIX models selfish and
collaborative behavior as incremental steps in each agent's decision process.
This allows agents to dynamically adapt their behavior to different situations
balancing independence and collaboration. Experiments using a variety of
simulation environments demonstrate that CoMIX outperforms baselines on
collaborative tasks. The results validate our incremental policy approach as
effective technique for improving coordination in multi-agent systems
Aprendizagem de coordenação em sistemas multi-agente
The ability for an agent to coordinate with others within a system is a
valuable property in multi-agent systems. Agents either cooperate as a team
to accomplish a common goal, or adapt to opponents to complete different
goals without being exploited. Research has shown that learning multi-agent
coordination is significantly more complex than learning policies in singleagent
environments, and requires a variety of techniques to deal with the
properties of a system where agents learn concurrently. This thesis aims to
determine how can machine learning be used to achieve coordination within
a multi-agent system. It asks what techniques can be used to tackle the
increased complexity of such systems and their credit assignment challenges,
how to achieve coordination, and how to use communication to improve the
behavior of a team.
Many algorithms for competitive environments are tabular-based, preventing
their use with high-dimension or continuous state-spaces, and may be
biased against specific equilibrium strategies. This thesis proposes multiple
deep learning extensions for competitive environments, allowing algorithms
to reach equilibrium strategies in complex and partially-observable environments,
relying only on local information. A tabular algorithm is also extended
with a new update rule that eliminates its bias against deterministic strategies.
Current state-of-the-art approaches for cooperative environments rely
on deep learning to handle the environment’s complexity and benefit from a
centralized learning phase. Solutions that incorporate communication between
agents often prevent agents from being executed in a distributed
manner. This thesis proposes a multi-agent algorithm where agents learn
communication protocols to compensate for local partial-observability, and
remain independently executed. A centralized learning phase can incorporate
additional environment information to increase the robustness and speed with
which a team converges to successful policies. The algorithm outperforms
current state-of-the-art approaches in a wide variety of multi-agent environments.
A permutation invariant network architecture is also proposed
to increase the scalability of the algorithm to large team sizes. Further research
is needed to identify how can the techniques proposed in this thesis,
for cooperative and competitive environments, be used in unison for mixed
environments, and whether they are adequate for general artificial intelligence.A capacidade de um agente se coordenar com outros num sistema é uma
propriedade valiosa em sistemas multi-agente. Agentes cooperam como
uma equipa para cumprir um objetivo comum, ou adaptam-se aos oponentes
de forma a completar objetivos egoístas sem serem explorados. Investigação
demonstra que aprender coordenação multi-agente é significativamente
mais complexo que aprender estratégias em ambientes com um
único agente, e requer uma variedade de técnicas para lidar com um ambiente
onde agentes aprendem simultaneamente. Esta tese procura determinar
como aprendizagem automática pode ser usada para encontrar coordenação
em sistemas multi-agente. O documento questiona que técnicas podem ser
usadas para enfrentar a superior complexidade destes sistemas e o seu desafio
de atribuição de crédito, como aprender coordenação, e como usar
comunicação para melhorar o comportamento duma equipa.
Múltiplos algoritmos para ambientes competitivos são tabulares, o que impede
o seu uso com espaços de estado de alta-dimensão ou contínuos, e
podem ter tendências contra estratégias de equilíbrio específicas. Esta tese
propõe múltiplas extensões de aprendizagem profunda para ambientes competitivos,
permitindo a algoritmos atingir estratégias de equilíbrio em ambientes
complexos e parcialmente-observáveis, com base em apenas informação
local. Um algoritmo tabular é também extendido com um novo critério de
atualização que elimina a sua tendência contra estratégias determinísticas.
Atuais soluções de estado-da-arte para ambientes cooperativos têm base em
aprendizagem profunda para lidar com a complexidade do ambiente, e beneficiam
duma fase de aprendizagem centralizada. Soluções que incorporam
comunicação entre agentes frequentemente impedem os próprios de ser executados
de forma distribuída. Esta tese propõe um algoritmo multi-agente
onde os agentes aprendem protocolos de comunicação para compensarem
por observabilidade parcial local, e continuam a ser executados de forma
distribuída. Uma fase de aprendizagem centralizada pode incorporar informação
adicional sobre ambiente para aumentar a robustez e velocidade
com que uma equipa converge para estratégias bem-sucedidas. O algoritmo
ultrapassa abordagens estado-da-arte atuais numa grande variedade de ambientes
multi-agente. Uma arquitetura de rede invariante a permutações é
também proposta para aumentar a escalabilidade do algoritmo para grandes
equipas. Mais pesquisa é necessária para identificar como as técnicas propostas
nesta tese, para ambientes cooperativos e competitivos, podem ser
usadas em conjunto para ambientes mistos, e averiguar se são adequadas a
inteligência artificial geral.Apoio financeiro da FCT e do FSE no âmbito do III Quadro Comunitário de ApoioPrograma Doutoral em Informátic
- …