Driving requires reacting to a wide variety of complex environment conditions
and agent behaviors. Explicitly modeling each possible scenario is unrealistic.
In contrast, imitation learning can, in theory, leverage data from large fleets
of human-driven cars. Behavior cloning in particular has been successfully used
to learn simple visuomotor policies end-to-end, but scaling to the full
spectrum of driving behaviors remains an unsolved problem. In this paper, we
propose a new benchmark to experimentally investigate the scalability and
limitations of behavior cloning. We show that behavior cloning leads to
state-of-the-art results, including in unseen environments, executing complex
lateral and longitudinal maneuvers without these reactions being explicitly
programmed. However, we confirm well-known limitations (due to dataset bias and
overfitting), new generalization issues (due to dynamic objects and the lack of
a causal model), and training instability requiring further research before
behavior cloning can graduate to real-world driving. The code of the studied
behavior cloning approaches can be found at
https://github.com/felipecode/coiltraine