2 research outputs found
CBinfer: Exploiting Frame-to-Frame Locality for Faster Convolutional Network Inference on Video Streams
The last few years have brought advances in computer vision at an amazing
pace, grounded on new findings in deep neural network construction and training
as well as the availability of large labeled datasets. Applying these networks
to images demands a high computational effort and pushes the use of
state-of-the-art networks on real-time video data out of reach of embedded
platforms. Many recent works focus on reducing network complexity for real-time
inference on embedded computing platforms. We adopt an orthogonal viewpoint and
propose a novel algorithm exploiting the spatio-temporal sparsity of pixel
changes. This optimized inference procedure resulted in an average speed-up of
9.1x over cuDNN on the Tegra X2 platform at a negligible accuracy loss of <0.1%
and no retraining of the network for a semantic segmentation application.
Similarly, an average speed-up of 7.0x has been achieved for a pose detection
DNN and a reduction of 5x of the number of arithmetic operations to be
performed for object detection on static camera video surveillance data. These
throughput gains combined with a lower power consumption result in an energy
efficiency of 511 GOp/s/W compared to 70 GOp/s/W for the baseline.Comment: arXiv admin note: substantial text overlap with arXiv:1704.0431