Learning from demonstration (LfD) is useful in settings where hand-coding
behaviour or a reward function is impractical. It has succeeded in a wide range
of problems but typically relies on manually generated demonstrations or
specially deployed sensors and has not generally been able to leverage the
copious demonstrations available in the wild: those that capture behaviours
that were occurring anyway using sensors that were already deployed for another
purpose, e.g., traffic camera footage capturing demonstrations of natural
behaviour of vehicles, cyclists, and pedestrians. We propose Video to Behaviour
(ViBe), a new approach to learn models of behaviour from unlabelled raw video
data of a traffic scene collected from a single, monocular, initially
uncalibrated camera with ordinary resolution. Our approach calibrates the
camera, detects relevant objects, tracks them through time, and uses the
resulting trajectories to perform LfD, yielding models of naturalistic
behaviour. We apply ViBe to raw videos of a traffic intersection and show that
it can learn purely from videos, without additional expert knowledge.Comment: Accepted to the IEEE International Conference on Robotics and
Automation (ICRA) 2019; extended version with appendi