Multi-agent applications have recently gained significant popularity. In many
computer vision tasks, a network of agents, such as a team of robots with
cameras, could work collaboratively to perceive the environment for efficient
and accurate situation awareness. However, these agents often have limited
computation, communication, and storage resources. Thus, reducing resource
consumption while still providing an accurate perception of the environment
becomes an important goal when deploying multi-agent systems. To achieve this
goal, we identify and leverage the overlap among different camera views in
multi-agent systems for reducing the processing, transmission and storage of
redundant/unimportant video frames. Specifically, we have developed two
collaborative multi-agent video fast-forwarding frameworks in distributed and
centralized settings, respectively. In these frameworks, each individual agent
can selectively process or skip video frames at adjustable paces based on
multiple strategies via reinforcement learning. Multiple agents then
collaboratively sense the environment via either 1) a consensus-based
distributed framework called DMVF that periodically updates the fast-forwarding
strategies of agents by establishing communication and consensus among
connected neighbors, or 2) a centralized framework called MFFNet that utilizes
a central controller to decide the fast-forwarding strategies for agents based
on collected data. We demonstrate the efficacy and efficiency of our proposed
frameworks on a real-world surveillance video dataset VideoWeb and a new
simulated driving dataset CarlaSim, through extensive simulations and
deployment on an embedded platform with TCP communication. We show that
compared with other approaches in the literature, our frameworks achieve better
coverage of important frames, while significantly reducing the number of frames
processed at each agent.Comment: IEEE Transactions on Multimedia, 2023. arXiv admin note: text overlap
with arXiv:2008.0443