11 research outputs found
The Devil of Face Recognition is in the Noise
The growing scale of face recognition datasets empowers us to train strong
convolutional networks for face recognition. While a variety of architectures
and loss functions have been devised, we still have a limited understanding of
the source and consequence of label noise inherent in existing datasets. We
make the following contributions: 1) We contribute cleaned subsets of popular
face databases, i.e., MegaFace and MS-Celeb-1M datasets, and build a new
large-scale noise-controlled IMDb-Face dataset. 2) With the original datasets
and cleaned subsets, we profile and analyze label noise properties of MegaFace
and MS-Celeb-1M. We show that a few orders more samples are needed to achieve
the same accuracy yielded by a clean subset. 3) We study the association
between different types of noise, i.e., label flips and outliers, with the
accuracy of face recognition models. 4) We investigate ways to improve data
cleanliness, including a comprehensive user study on the influence of data
labeling strategies to annotation accuracy. The IMDb-Face dataset has been
released on https://github.com/fwang91/IMDb-Face.Comment: accepted to ECCV'1
Learn to Propagate Reliably on Noisy Affinity Graphs
Recent works have shown that exploiting unlabeled data through label
propagation can substantially reduce the labeling cost, which has been a
critical issue in developing visual recognition models. Yet, how to propagate
labels reliably, especially on a dataset with unknown outliers, remains an open
question. Conventional methods such as linear diffusion lack the capability of
handling complex graph structures and may perform poorly when the seeds are
sparse. Latest methods based on graph neural networks would face difficulties
on performance drop as they scale out to noisy graphs. To overcome these
difficulties, we propose a new framework that allows labels to be propagated
reliably on large-scale real-world data. This framework incorporates (1) a
local graph neural network to predict accurately on varying local structures
while maintaining high scalability, and (2) a confidence-based path scheduler
that identifies outliers and moves forward the propagation frontier in a
prudent way. Experiments on both ImageNet and Ms-Celeb-1M show that our
confidence guided framework can significantly improve the overall accuracies of
the propagated labels, especially when the graph is very noisy.Comment: 14 pages, 7 figures, ECCV 202