The shift operation was recently introduced as an alternative to spatial
convolutions. The operation moves subsets of activations horizontally and/or
vertically. Spatial convolutions are then replaced with shift operations
followed by point-wise convolutions, significantly reducing computational
costs. In this work, we investigate how shifts should best be applied to high
accuracy CNNs. We apply shifts of two different neighbourhood groups to ResNet
on ImageNet: the originally introduced 8-connected (8C) neighbourhood shift and
the less well studied 4-connected (4C) neighbourhood shift. We find that when
replacing ResNet's spatial convolutions with shifts, both shift neighbourhoods
give equal ImageNet accuracy, showing the sufficiency of small neighbourhoods
for large images. Interestingly, when incorporating shifts to all point-wise
convolutions in residual networks, 4-connected shifts outperform 8-connected
shifts. Such a 4-connected shift setup gives the same accuracy as full residual
networks while reducing the number of parameters and FLOPs by over 40%. We then
highlight that without spatial convolutions, ResNet's downsampling/upsampling
bottleneck channel structure is no longer needed. We show a new, 4C shift-based
residual network, much shorter than the original ResNet yet with a higher
accuracy for the same computational cost. This network is the highest accuracy
shift-based network yet shown, demonstrating the potential of shifting in deep
neural networks.Comment: ICCV Neural Architects Workshop 201