Machine learning provides a powerful tool for building socially compliant
robotic systems that go beyond simple predictive models of human behavior. By
observing and understanding human interactions from past experiences, learning
can enable effective social navigation behaviors directly from data. However,
collecting navigation data in human-occupied environments may require
teleoperation or continuous monitoring, making the process prohibitively
expensive to scale. In this paper, we present a scalable data collection system
for vision-based navigation, SACSoN, that can autonomously navigate around
pedestrians in challenging real-world environments while encouraging rich
interactions. SACSoN uses visual observations to observe and react to humans in
its vicinity. It couples this visual understanding with continual learning and
an autonomous collision recovery system that limits the involvement of a human
operator, allowing for better dataset scaling. We use a this system to collect
the SACSoN dataset, the largest-of-its-kind visual navigation dataset of
autonomous robots operating in human-occupied spaces, spanning over 75 hours
and 4000 rich interactions with humans. Our experiments show that collecting
data with a novel objective that encourages interactions, leads to significant
improvements in downstream tasks such as inferring pedestrian dynamics and
learning socially compliant navigation behaviors. We make videos of our
autonomous data collection system and the SACSoN dataset publicly available on
our project page.Comment: 9 pages, 12 figures, 4 table