We introduce the Single Stage Headless (SSH) face detector. Unlike two stage
proposal-classification detectors, SSH detects faces in a single stage directly
from the early convolutional layers in a classification network. SSH is
headless. That is, it is able to achieve state-of-the-art results while
removing the "head" of its underlying classification network -- i.e. all fully
connected layers in the VGG-16 which contains a large number of parameters.
Additionally, instead of relying on an image pyramid to detect faces with
various scales, SSH is scale-invariant by design. We simultaneously detect
faces with different scales in a single forward pass of the network, but from
different layers. These properties make SSH fast and light-weight.
Surprisingly, with a headless VGG-16, SSH beats the ResNet-101-based
state-of-the-art on the WIDER dataset. Even though, unlike the current
state-of-the-art, SSH does not use an image pyramid and is 5X faster. Moreover,
if an image pyramid is deployed, our light-weight network achieves
state-of-the-art on all subsets of the WIDER dataset, improving the AP by 2.5%.
SSH also reaches state-of-the-art results on the FDDB and Pascal-Faces datasets
while using a small input size, leading to a runtime of 50 ms/image on a GPU.
The code is available at https://github.com/mahyarnajibi/SSH.Comment: International Conference on Computer Vision (ICCV) 201