104,228 research outputs found
Toward multi-target self-organizing pursuit in a partially observable Markov game
The multiple-target self-organizing pursuit (SOP) problem has wide
applications and has been considered a challenging self-organization game for
distributed systems, in which intelligent agents cooperatively pursue multiple
dynamic targets with partial observations. This work proposes a framework for
decentralized multi-agent systems to improve intelligent agents' search and
pursuit capabilities. We model a self-organizing system as a partially
observable Markov game (POMG) with the features of decentralization, partial
observation, and noncommunication. The proposed distributed algorithm: fuzzy
self-organizing cooperative coevolution (FSC2) is then leveraged to resolve the
three challenges in multi-target SOP: distributed self-organizing search (SOS),
distributed task allocation, and distributed single-target pursuit. FSC2
includes a coordinated multi-agent deep reinforcement learning method that
enables homogeneous agents to learn natural SOS patterns. Additionally, we
propose a fuzzy-based distributed task allocation method, which locally
decomposes multi-target SOP into several single-target pursuit problems. The
cooperative coevolution principle is employed to coordinate distributed
pursuers for each single-target pursuit problem. Therefore, the uncertainties
of inherent partial observation and distributed decision-making in the POMG can
be alleviated. The experimental results demonstrate that distributed
noncommunicating multi-agent coordination with partial observations in all
three subtasks are effective, and 2048 FSC2 agents can perform efficient
multi-target SOP with almost 100% capture rates
Generative Exploration and Exploitation
Sparse reward is one of the biggest challenges in reinforcement learning
(RL). In this paper, we propose a novel method called Generative Exploration
and Exploitation (GENE) to overcome sparse reward. GENE automatically generates
start states to encourage the agent to explore the environment and to exploit
received reward signals. GENE can adaptively tradeoff between exploration and
exploitation according to the varying distributions of states experienced by
the agent as the learning progresses. GENE relies on no prior knowledge about
the environment and can be combined with any RL algorithm, no matter on-policy
or off-policy, single-agent or multi-agent. Empirically, we demonstrate that
GENE significantly outperforms existing methods in three tasks with only binary
rewards, including Maze, Maze Ant, and Cooperative Navigation. Ablation studies
verify the emergence of progressive exploration and automatic reversing.Comment: AAAI'2
- …