191,966 research outputs found
Deep Reinforcement Learning for Swarm Systems
Recently, deep reinforcement learning (RL) methods have been applied
successfully to multi-agent scenarios. Typically, these methods rely on a
concatenation of agent states to represent the information content required for
decentralized decision making. However, concatenation scales poorly to swarm
systems with a large number of homogeneous agents as it does not exploit the
fundamental properties inherent to these systems: (i) the agents in the swarm
are interchangeable and (ii) the exact number of agents in the swarm is
irrelevant. Therefore, we propose a new state representation for deep
multi-agent RL based on mean embeddings of distributions. We treat the agents
as samples of a distribution and use the empirical mean embedding as input for
a decentralized policy. We define different feature spaces of the mean
embedding using histograms, radial basis functions and a neural network learned
end-to-end. We evaluate the representation on two well known problems from the
swarm literature (rendezvous and pursuit evasion), in a globally and locally
observable setup. For the local setup we furthermore introduce simple
communication protocols. Of all approaches, the mean embedding representation
using neural network features enables the richest information exchange between
neighboring agents facilitating the development of more complex collective
strategies.Comment: 31 pages, 12 figures, version 3 (published in JMLR Volume 20
Learning Reward Machines in Cooperative Multi-Agent Tasks
This paper presents a novel approach to Multi-Agent Reinforcement Learning
(MARL) that combines cooperative task decomposition with the learning of reward
machines (RMs) encoding the structure of the sub-tasks. The proposed method
helps deal with the non-Markovian nature of the rewards in partially observable
environments and improves the interpretability of the learnt policies required
to complete the cooperative task. The RMs associated with each sub-task are
learnt in a decentralised manner and then used to guide the behaviour of each
agent. By doing so, the complexity of a cooperative multi-agent problem is
reduced, allowing for more effective learning. The results suggest that our
approach is a promising direction for future research in MARL, especially in
complex environments with large state spaces and multiple agents.Comment: Neuro-symbolic AI for Agent and Multi-Agent Systems Workshop at
AAMAS'2
Multi-agent deep reinforcement learning with centralized training and decentralized execution for transportation infrastructure management
We present a multi-agent Deep Reinforcement Learning (DRL) framework for
managing large transportation infrastructure systems over their life-cycle.
Life-cycle management of such engineering systems is a computationally
intensive task, requiring appropriate sequential inspection and maintenance
decisions able to reduce long-term risks and costs, while dealing with
different uncertainties and constraints that lie in high-dimensional spaces. To
date, static age- or condition-based maintenance methods and risk-based or
periodic inspection plans have mostly addressed this class of optimization
problems. However, optimality, scalability, and uncertainty limitations are
often manifested under such approaches. The optimization problem in this work
is cast in the framework of constrained Partially Observable Markov Decision
Processes (POMDPs), which provides a comprehensive mathematical basis for
stochastic sequential decision settings with observation uncertainties, risk
considerations, and limited resources. To address significantly large state and
action spaces, a Deep Decentralized Multi-agent Actor-Critic (DDMAC) DRL method
with Centralized Training and Decentralized Execution (CTDE), termed as
DDMAC-CTDE is developed. The performance strengths of the DDMAC-CTDE method are
demonstrated in a generally representative and realistic example application of
an existing transportation network in Virginia, USA. The network includes
several bridge and pavement components with nonstationary degradation,
agency-imposed constraints, and traffic delay and risk considerations. Compared
to traditional management policies for transportation networks, the proposed
DDMAC-CTDE method vastly outperforms its counterparts. Overall, the proposed
algorithmic framework provides near optimal solutions for transportation
infrastructure management under real-world constraints and complexities
Learning in Nonzero-Sum Stochastic Games with Potentials
Multi-agent reinforcement learning (MARL) has become effective in tackling
discrete cooperative game scenarios. However, MARL has yet to penetrate
settings beyond those modelled by team and zero-sum games, confining it to a
small subset of multi-agent systems. In this paper, we introduce a new
generation of MARL learners that can handle nonzero-sum payoff structures and
continuous settings. In particular, we study the MARL problem in a class of
games known as stochastic potential games (SPGs) with continuous state-action
spaces. Unlike cooperative games, in which all agents share a common reward,
SPGs are capable of modelling real-world scenarios where agents seek to fulfil
their individual goals. We prove theoretically our learning method, SPot-AC,
enables independent agents to learn Nash equilibrium strategies in polynomial
time. We demonstrate our framework tackles previously unsolvable tasks such as
Coordination Navigation and large selfish routing games and that it outperforms
the state of the art MARL baselines such as MADDPG and COMIX in such scenarios.Comment: ICML 202
Adaptive trajectory-constrained exploration strategy for deep reinforcement learning
Deep reinforcement learning (DRL) faces significant challenges in addressing
the hard-exploration problems in tasks with sparse or deceptive rewards and
large state spaces. These challenges severely limit the practical application
of DRL. Most previous exploration methods relied on complex architectures to
estimate state novelty or introduced sensitive hyperparameters, resulting in
instability. To mitigate these issues, we propose an efficient adaptive
trajectory-constrained exploration strategy for DRL. The proposed method guides
the policy of the agent away from suboptimal solutions by leveraging incomplete
offline demonstrations as references. This approach gradually expands the
exploration scope of the agent and strives for optimality in a constrained
optimization manner. Additionally, we introduce a novel policy-gradient-based
optimization algorithm that utilizes adaptively clipped trajectory-distance
rewards for both single- and multi-agent reinforcement learning. We provide a
theoretical analysis of our method, including a deduction of the worst-case
approximation error bounds, highlighting the validity of our approach for
enhancing exploration. To evaluate the effectiveness of the proposed method, we
conducted experiments on two large 2D grid world mazes and several MuJoCo
tasks. The extensive experimental results demonstrate the significant
advantages of our method in achieving temporally extended exploration and
avoiding myopic and suboptimal behaviors in both single- and multi-agent
settings. Notably, the specific metrics and quantifiable results further
support these findings. The code used in the study is available at
\url{https://github.com/buaawgj/TACE}.Comment: 35 pages, 36 figures; accepted by Knowledge-Based Systems, not
publishe
Enhancing Exploration and Safety in Deep Reinforcement Learning
A Deep Reinforcement Learning (DRL) agent tries to learn a policy maximizing a long-term objective by trials and errors in large state spaces. However, this learning paradigm requires a non-trivial amount of interactions in the environment to achieve good performance. Moreover, critical applications, such as robotics, typically involve safety criteria to consider while designing novel DRL solutions. Hence, devising safe learning approaches with efficient exploration is crucial to avoid getting stuck in local optima, failing to learn properly, or causing damages to the surrounding environment. This thesis focuses on developing Deep Reinforcement Learning algorithms to foster efficient exploration and safer behaviors in simulation and real domains of interest, ranging from robotics to multi-agent systems. To this end, we rely both on standard benchmarks, such as SafetyGym, and robotic tasks widely adopted in the literature (e.g., manipulation, navigation). This variety of problems is crucial to assess the statistical significance of our empirical studies and the generalization skills of our approaches. We initially benchmark the sample efficiency versus performance trade-off between value-based and policy-gradient algorithms. This part highlights the benefits of using non-standard simulation environments (i.e., Unity), which also facilitates the development of further optimization for DRL. We also discuss the limitations of standard evaluation metrics (e.g., return) in characterizing the actual behaviors of a policy, proposing the use of Formal Verification (FV) as a practical methodology to evaluate behaviors over desired specifications. The second part introduces Evolutionary Algorithms (EAs) as a gradient-free complimentary optimization strategy. In detail, we combine population-based and gradient-based DRL to diversify exploration and improve performance both in single and multi-agent applications. For the latter, we discuss how prior Multi-Agent (Deep) Reinforcement Learning (MARL) approaches hinder exploration, proposing an architecture that favors cooperation without affecting exploration
- …