Multi-Agent Reinforcement Learning (MARL) has become a classic paradigm to
solve diverse, intelligent control tasks like autonomous driving in Internet of
Vehicles (IoV). However, the widely assumed existence of a central node to
implement centralized federated learning-assisted MARL might be impractical in
highly dynamic scenarios, and the excessive communication overheads possibly
overwhelm the IoV system. Therefore, in this paper, we design a communication
efficient cooperative MARL algorithm, named RSM-MAPPO, to reduce the
communication overheads in a fully distributed architecture. In particular,
RSM-MAPPO enhances the multi-agent Proximal Policy Optimization (PPO) by
incorporating the idea of segment mixture and augmenting multiple model
replicas from received neighboring policy segments. Afterwards, RSM-MAPPO
adopts a theory-guided metric to regulate the selection of contributive
replicas to guarantee the policy improvement. Finally, extensive simulations in
a mixed-autonomy traffic control scenario verify the effectiveness of the
RSM-MAPPO algorithm