Convolution Neural Networks (CNN) have been extremely successful in solving
intensive computer vision tasks. The convolutional filters used in CNNs have
played a major role in this success, by extracting useful features from the
inputs. Recently researchers have tried to boost the performance of CNNs by
re-calibrating the feature maps produced by these filters, e.g.,
Squeeze-and-Excitation Networks (SENets). These approaches have achieved better
performance by Exciting up the important channels or feature maps while
diminishing the rest. However, in the process, architectural complexity has
increased. We propose an architectural block that introduces much lower
complexity than the existing methods of CNN performance boosting while
performing significantly better than them. We carry out experiments on the
CIFAR, ImageNet and MS-COCO datasets, and show that the proposed block can
challenge the state-of-the-art results. Our method boosts the ResNet-50
architecture to perform comparably to the ResNet-152 architecture, which is a
three times deeper network, on classification. We also show experimentally that
our method is not limited to classification but also generalizes well to other
tasks such as object detection.Comment: IEEE Winter Conference on Applications of Computer Vision (WACV),
202