Abstract—We address the problem of repeated coverage by a team of robots of the boundaries of a target area and the structures inside it. Events may occur on any parts of the boundaries and may have different importance weights. In addition, the boundaries of the area and the structures are heterogeneous, so that events may appear with varying probabilities on different parts of the boundary, and this probability may change over time. The goal is to maximize the reward by detecting the maximum number of events, weighted by their importance, in minimum time. The reward a robot receives for detecting an event depends on how early the event is detected. To this end, each robot autonomously and continuously learns the pattern of event occurrence on the boundaries over time, capturing the uncertainties in the target area. Based on the policy being learned to maximize the reward, each robot then plans in a decentralized manner to select the best path at that time in the target area to visit the most promising parts of the boundary. The performance of the learning algorithm is compared with a heuristic algorithm for the Travelling Salesman Problem, on the basis of the total reward collected by the team during a finite repeated boundary coverage mission. I
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.