3,168 research outputs found
Context-Aware Sparse Deep Coordination Graphs
Learning sparse coordination graphs adaptive to the coordination dynamics
among agents is a long-standing problem in cooperative multi-agent learning.
This paper studies this problem and proposes a novel method using the variance
of payoff functions to construct context-aware sparse coordination topologies.
We theoretically consolidate our method by proving that the smaller the
variance of payoff functions is, the less likely action selection will change
after removing the corresponding edge. Moreover, we propose to learn action
representations to effectively reduce the influence of payoff functions'
estimation errors on graph construction. To empirically evaluate our method, we
present the Multi-Agent COordination (MACO) benchmark by collecting classic
coordination problems in the literature, increasing their difficulty, and
classifying them into different types. We carry out a case study and
experiments on the MACO and StarCraft II micromanagement benchmark to
demonstrate the dynamics of sparse graph learning, the influence of graph
sparseness, and the learning performance of our method
Aprendizagem de coordenação em sistemas multi-agente
The ability for an agent to coordinate with others within a system is a
valuable property in multi-agent systems. Agents either cooperate as a team
to accomplish a common goal, or adapt to opponents to complete different
goals without being exploited. Research has shown that learning multi-agent
coordination is significantly more complex than learning policies in singleagent
environments, and requires a variety of techniques to deal with the
properties of a system where agents learn concurrently. This thesis aims to
determine how can machine learning be used to achieve coordination within
a multi-agent system. It asks what techniques can be used to tackle the
increased complexity of such systems and their credit assignment challenges,
how to achieve coordination, and how to use communication to improve the
behavior of a team.
Many algorithms for competitive environments are tabular-based, preventing
their use with high-dimension or continuous state-spaces, and may be
biased against specific equilibrium strategies. This thesis proposes multiple
deep learning extensions for competitive environments, allowing algorithms
to reach equilibrium strategies in complex and partially-observable environments,
relying only on local information. A tabular algorithm is also extended
with a new update rule that eliminates its bias against deterministic strategies.
Current state-of-the-art approaches for cooperative environments rely
on deep learning to handle the environment’s complexity and benefit from a
centralized learning phase. Solutions that incorporate communication between
agents often prevent agents from being executed in a distributed
manner. This thesis proposes a multi-agent algorithm where agents learn
communication protocols to compensate for local partial-observability, and
remain independently executed. A centralized learning phase can incorporate
additional environment information to increase the robustness and speed with
which a team converges to successful policies. The algorithm outperforms
current state-of-the-art approaches in a wide variety of multi-agent environments.
A permutation invariant network architecture is also proposed
to increase the scalability of the algorithm to large team sizes. Further research
is needed to identify how can the techniques proposed in this thesis,
for cooperative and competitive environments, be used in unison for mixed
environments, and whether they are adequate for general artificial intelligence.A capacidade de um agente se coordenar com outros num sistema é uma
propriedade valiosa em sistemas multi-agente. Agentes cooperam como
uma equipa para cumprir um objetivo comum, ou adaptam-se aos oponentes
de forma a completar objetivos egoístas sem serem explorados. Investigação
demonstra que aprender coordenação multi-agente é significativamente
mais complexo que aprender estratégias em ambientes com um
único agente, e requer uma variedade de técnicas para lidar com um ambiente
onde agentes aprendem simultaneamente. Esta tese procura determinar
como aprendizagem automática pode ser usada para encontrar coordenação
em sistemas multi-agente. O documento questiona que técnicas podem ser
usadas para enfrentar a superior complexidade destes sistemas e o seu desafio
de atribuição de crédito, como aprender coordenação, e como usar
comunicação para melhorar o comportamento duma equipa.
Múltiplos algoritmos para ambientes competitivos são tabulares, o que impede
o seu uso com espaços de estado de alta-dimensão ou contínuos, e
podem ter tendências contra estratégias de equilíbrio específicas. Esta tese
propõe múltiplas extensões de aprendizagem profunda para ambientes competitivos,
permitindo a algoritmos atingir estratégias de equilíbrio em ambientes
complexos e parcialmente-observáveis, com base em apenas informação
local. Um algoritmo tabular é também extendido com um novo critério de
atualização que elimina a sua tendência contra estratégias determinísticas.
Atuais soluções de estado-da-arte para ambientes cooperativos têm base em
aprendizagem profunda para lidar com a complexidade do ambiente, e beneficiam
duma fase de aprendizagem centralizada. Soluções que incorporam
comunicação entre agentes frequentemente impedem os próprios de ser executados
de forma distribuída. Esta tese propõe um algoritmo multi-agente
onde os agentes aprendem protocolos de comunicação para compensarem
por observabilidade parcial local, e continuam a ser executados de forma
distribuída. Uma fase de aprendizagem centralizada pode incorporar informação
adicional sobre ambiente para aumentar a robustez e velocidade
com que uma equipa converge para estratégias bem-sucedidas. O algoritmo
ultrapassa abordagens estado-da-arte atuais numa grande variedade de ambientes
multi-agente. Uma arquitetura de rede invariante a permutações é
também proposta para aumentar a escalabilidade do algoritmo para grandes
equipas. Mais pesquisa é necessária para identificar como as técnicas propostas
nesta tese, para ambientes cooperativos e competitivos, podem ser
usadas em conjunto para ambientes mistos, e averiguar se são adequadas a
inteligência artificial geral.Apoio financeiro da FCT e do FSE no âmbito do III Quadro Comunitário de ApoioPrograma Doutoral em Informátic
A self-learning intersection control system for connected and automated vehicles
This study proposes a Decentralized Sparse Coordination Learning System (DSCLS) based on Deep Reinforcement Learning (DRL) to control intersections under the Connected and Automated Vehicles (CAVs) environment. In this approach, roadway sections are divided into small areas; vehicles try to reserve their desired area ahead of time, based on having a common desired area with other CAVs; the vehicles would be in an independent or coordinated state. Individual CAVs are set accountable for decision-making at each step in both coordinated and independent states. In the training process, CAVs learn to minimize the overall delay at the intersection. Due to the chain impact of taking random actions in the training course, the trained model can deal with unprecedented volume circumstances, the main challenge in intersection management. Application of the model to a single-lane intersection with no turning movement as a proof-of-concept test reveals noticeable improvements in traffic measures compared to three other intersection control systems.
A Spring Mass Damper (SMD) model is developed to control platooning behavior of CAVs. In the SMD model, each vehicle is assumed as a mass, coupled with its preceding vehicle with a spring and a damper. The spring constant and damper coefficient control the interaction between vehicles. Limitations on communication range and the number of vehicles in each platoon are applied in this model, and the SMD model controls intra-platoon and inter-platoon interactions. The simulation result for a regular highway section reveals that the proposed platooning algorithm increases the maximum throughput by 29% and 63% under 50% and 100% market penetration rate of CAVs. A merging section with different volume combinations on the main section and merging section and different market penetration rates of CAVs is also modeled to test inter-platoon spacing performance in accommodating merging vehicles. Noticeable travel time reduction is observed in both mainline and merging lanes and under all volume combinations in 80% and higher MPR of CAVs.
For a more reliable assessment of the DSCLS, the model is applied to a more realistic intersection, including three approaching lanes in each direction and turning movements. The proposed algorithm decreases delay by 58%, 19%, and 13% in moderate, high, and extreme volume regimes, improving travel time accordingly. Comparison of safety measures reveals 28% improvement in Post Encroachment Time (PET) in the extreme volume regime and minor improvements in high and moderate volume regimes. Due to the limited acceleration and deceleration rates, the proposed model does not show a better performance in environmental measures, including fuel consumption and CO2 emission, compared to the conventional control systems. However, the DSCLS noticeably outperforms the other pixel-reservation counterpart control system, with limited acceleration and deceleration rates. The application of the model to a corridor of four interactions shows the same trends in traffic, safety, and environmental measures as the single intersection experiment.
An automated intersection control system for platooning CAVs is developed by combining the two proposed models, which remarkably improves traffic and safety measures, specifically in extreme volume regimes compared to the regular DSCLS model
Nowhere to Go: Benchmarking Multi-robot Collaboration in Target Trapping Environment
Collaboration is one of the most important factors in multi-robot systems.
Considering certain real-world applications and to further promote its
development, we propose a new benchmark to evaluate multi-robot collaboration
in Target Trapping Environment (T2E). In T2E, two kinds of robots (called
captor robot and target robot) share the same space. The captors aim to catch
the target collaboratively, while the target will try to escape from the trap.
Both the trapping and escaping process can use the environment layout to help
achieve the corresponding objective, which requires high collaboration between
robots and the utilization of the environment. For the benchmark, we present
and evaluate multiple learning-based baselines in T2E, and provide insights
into regimes of multi-robot collaboration. We also make our benchmark publicly
available and encourage researchers from related robotics disciplines to
propose, evaluate, and compare their solutions in this benchmark. Our project
is released at https://github.com/Dr-Xiaogaren/T2E
Multiagent Cooperative Learning Strategies for Pursuit-Evasion Games
This study examines the pursuit-evasion problem for coordinating multiple robotic pursuers to locate and track a nonadversarial mobile evader in a dynamic environment. Two kinds of pursuit strategies are proposed, one for agents that cooperate with each other and the other for agents that operate independently. This work further employs the probabilistic theory to analyze the uncertain state information about the pursuers and the evaders and uses case-based reasoning to equip agents with memories and learning abilities. According to the concepts of assimilation and accommodation, both positive-angle and bevel-angle strategies are developed to assist agents in adapting to their environment effectively. The case study analysis uses the Recursive Porous Agent Simulation Toolkit (REPAST) to implement a multiagent system and demonstrates superior performance of the proposed approaches to the pursuit-evasion game
Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning
We investigate whether Deep Reinforcement Learning (Deep RL) is able to
synthesize sophisticated and safe movement skills for a low-cost, miniature
humanoid robot that can be composed into complex behavioral strategies in
dynamic environments. We used Deep RL to train a humanoid robot with 20
actuated joints to play a simplified one-versus-one (1v1) soccer game. We first
trained individual skills in isolation and then composed those skills
end-to-end in a self-play setting. The resulting policy exhibits robust and
dynamic movement skills such as rapid fall recovery, walking, turning, kicking
and more; and transitions between them in a smooth, stable, and efficient
manner - well beyond what is intuitively expected from the robot. The agents
also developed a basic strategic understanding of the game, and learned, for
instance, to anticipate ball movements and to block opponent shots. The full
range of behaviors emerged from a small set of simple rewards. Our agents were
trained in simulation and transferred to real robots zero-shot. We found that a
combination of sufficiently high-frequency control, targeted dynamics
randomization, and perturbations during training in simulation enabled
good-quality transfer, despite significant unmodeled effects and variations
across robot instances. Although the robots are inherently fragile, minor
hardware modifications together with basic regularization of the behavior
during training led the robots to learn safe and effective movements while
still performing in a dynamic and agile way. Indeed, even though the agents
were optimized for scoring, in experiments they walked 156% faster, took 63%
less time to get up, and kicked 24% faster than a scripted baseline, while
efficiently combining the skills to achieve the longer term objectives.
Examples of the emergent behaviors and full 1v1 matches are available on the
supplementary website.Comment: Project website: https://sites.google.com/view/op3-socce
- …