Conventional deep convolutional neural networks (CNNs) apply convolution
operators uniformly in space across all feature maps for hundreds of layers -
this incurs a high computational cost for real-time applications. For many
problems such as object detection and semantic segmentation, we are able to
obtain a low-cost computation mask, either from a priori problem knowledge, or
from a low-resolution segmentation network. We show that such computation masks
can be used to reduce computation in the high-resolution main network. Variants
of sparse activation CNNs have previously been explored on small-scale tasks
and showed no degradation in terms of object classification accuracy, but often
measured gains in terms of theoretical FLOPs without realizing a practical
speed-up when compared to highly optimized dense convolution implementations.
In this work, we leverage the sparsity structure of computation masks and
propose a novel tiling-based sparse convolution algorithm. We verified the
effectiveness of our sparse CNN on LiDAR-based 3D object detection, and we
report significant wall-clock speed-ups compared to dense convolution without
noticeable loss of accuracy.Comment: 10 pages, CVPR 201