The task of driving can sometimes require the processing of large amounts of visual information; such situations can overload the perceptual systems of human drivers
leading to ‘inattentional blindness’, where potentially critical visual information is overlooked. This phenomenon of ‘looking but failing to see’ is the third largest contributor
to traffic accidents in the UK. In this work we develop a method to identify these particularly demanding driving scenes using an end-to-end driving architecture, imbued with
a spatial attention mechanism and trained to mimic ground-truth driving controls from
video input. At test time, the network’s attention distribution is segmented to identify
relevant items in the driving scene which are used to estimate the attentional demand on
the driver according to an established model in cognitive neuroscience. Without collecting any ground-truth attentional demand data - instead using readily available odometry
data in a novel way - our approach is shown to outperform several baselines on a new
dataset of 1200 driving scenes labelled for attentional demand in driving