Real-time scene understanding has become crucial in many applications such as
autonomous driving. In this paper, we propose a deep architecture, called
BlitzNet, that jointly performs object detection and semantic segmentation in
one forward pass, allowing real-time computations. Besides the computational
gain of having a single network to perform several tasks, we show that object
detection and semantic segmentation benefit from each other in terms of
accuracy. Experimental results for VOC and COCO datasets show state-of-the-art
performance for object detection and segmentation among real time systems