202,017 research outputs found
Level Playing Field for Million Scale Face Recognition
Face recognition has the perception of a solved problem, however when tested
at the million-scale exhibits dramatic variation in accuracies across the
different algorithms. Are the algorithms very different? Is access to good/big
training data their secret weapon? Where should face recognition improve? To
address those questions, we created a benchmark, MF2, that requires all
algorithms to be trained on same data, and tested at the million scale. MF2 is
a public large-scale set with 672K identities and 4.7M photos created with the
goal to level playing field for large scale face recognition. We contrast our
results with findings from the other two large-scale benchmarks MegaFace
Challenge and MS-Celebs-1M where groups were allowed to train on any
private/public/big/small set. Some key discoveries: 1) algorithms, trained on
MF2, were able to achieve state of the art and comparable results to algorithms
trained on massive private sets, 2) some outperformed themselves once trained
on MF2, 3) invariance to aging suffers from low accuracies as in MegaFace,
identifying the need for larger age variations possibly within identities or
adjustment of algorithms in future testings
The Devil of Face Recognition is in the Noise
The growing scale of face recognition datasets empowers us to train strong
convolutional networks for face recognition. While a variety of architectures
and loss functions have been devised, we still have a limited understanding of
the source and consequence of label noise inherent in existing datasets. We
make the following contributions: 1) We contribute cleaned subsets of popular
face databases, i.e., MegaFace and MS-Celeb-1M datasets, and build a new
large-scale noise-controlled IMDb-Face dataset. 2) With the original datasets
and cleaned subsets, we profile and analyze label noise properties of MegaFace
and MS-Celeb-1M. We show that a few orders more samples are needed to achieve
the same accuracy yielded by a clean subset. 3) We study the association
between different types of noise, i.e., label flips and outliers, with the
accuracy of face recognition models. 4) We investigate ways to improve data
cleanliness, including a comprehensive user study on the influence of data
labeling strategies to annotation accuracy. The IMDb-Face dataset has been
released on https://github.com/fwang91/IMDb-Face.Comment: accepted to ECCV'1
- …