Perimeter Control (PC) strategies have been proposed to address urban road
network control in oversaturated situations by monitoring transfer flows of the
Protected Network (PN). The uniform metering rate for cordon signals in
existing studies ignores the variety of local traffic states at the
intersection level, which may cause severe local traffic congestion and ruin
the network stability. This paper introduces a semi-model dependent Multi-Agent
Reinforcement Learning (MARL) framework to conduct PC with heterogeneous cordon
signal behaviors. The proposed strategy integrates the MARL-based signal
control method with centralized feedback PC policy and is applied to cordon
signals of the PN. It operates as a two-stage system, with the feedback PC
strategy detecting the overall traffic state within the PN and then
distributing local instructions to cordon signals controlled by agents in the
MARL framework. Each cordon signal acts independently and differently, creating
a slack and distributed PC for the PN. The combination of the model-free and
model-based methods is achieved by reconstructing the action-value function of
the local agents with PC feedback reward without violating the integrity of the
local signal control policy learned from the RL training process. Through
numerical tests with different demand patterns in a microscopic traffic
environment, the proposed PC strategy (a) is shown robustness, scalability, and
transferability, (b) outperforms state-of-the-art model-based PC strategies in
increasing network throughput, reducing cordon queue and carbon emission