While most approaches to semantic reasoning have fo-
cused on improving performance, in this paper we argue
that computational times are very important in order to en-
able real time applications such as autonomous driving. To-
wards this goal, we present an approach to joint classifi-
cation, detection and semantic segmentation via a unified
architecture where the encoder is shared amongst the three
tasks. Our approach is very simple, can be trained end-to-
end and performs extremely well in the challenging KITTI
dataset, outperforming the state-of-the-art in the road seg-
mentation task. Our approach is also very efficient, allow-
ing us to perform inference at more then 23 frames per sec-
ond.
Training scripts and trained weights to reproduce our
results can be found here: https://github.com/
MarvinTeichmann/MultiNetBegabtenstiftung Informatik Karlsruhe, ONR-N00014-
14-1-0232, Qualcomm, Samsung, NVIDIA, Google, EP-
SRC and NSER