Coverage path planning is the problem of finding the shortest path that
covers the entire free space of a given confined area, with applications
ranging from robotic lawn mowing and vacuum cleaning, to demining and
search-and-rescue tasks. While offline methods can find provably complete, and
in some cases optimal, paths for known environments, their value is limited in
online scenarios where the environment is not known beforehand, especially in
the presence of non-static obstacles. We propose an end-to-end reinforcement
learning-based approach in continuous state and action space, for the online
coverage path planning problem that can handle unknown environments. We
construct the observation space from both global maps and local sensory inputs,
allowing the agent to plan a long-term path, and simultaneously act on
short-term obstacle detections. To account for large-scale environments, we
propose to use a multi-scale map input representation. Furthermore, we propose
a novel total variation reward term for eliminating thin strips of uncovered
space in the learned path. To validate the effectiveness of our approach, we
perform extensive experiments in simulation with a distance sensor, surpassing
the performance of a recent reinforcement learning-based approach