ProMix: Combating Label Noise via Maximizing Clean Sample Utility

Dong, Yiwen; Feng, Lei; Wang, Haobo; Xiao, Ruixuan; Zhao, Junbo

ProMix: Combating Label Noise via Maximizing Clean Sample Utility

Authors: Yiwen Dong
Lei Feng
Haobo Wang
Ruixuan Xiao
Junbo Zhao
Publication date: 22 July 2022
Publisher

Abstract

The ability to train deep neural networks under label noise is appealing, as imperfectly annotated data are relatively cheaper to obtain. State-of-the-art approaches are based on semi-supervised learning(SSL), which selects small loss examples as clean and then applies SSL techniques for boosted performance. However, the selection step mostly provides a medium-sized and decent-enough clean subset, which overlooks a rich set of clean samples. In this work, we propose a novel noisy label learning framework ProMix that attempts to maximize the utility of clean samples for boosted performance. Key to our method, we propose a matched high-confidence selection technique that selects those examples having high confidence and matched prediction with its given labels. Combining with the small-loss selection, our method is able to achieve a precision of 99.27 and a recall of 98.22 in detecting clean samples on the CIFAR-10N dataset. Based on such a large set of clean data, ProMix improves the best baseline method by +2.67% on CIFAR-10N and +1.61% on CIFAR-100N datasets. The code and data are available at https://github.com/Justherozen/ProMixComment: Winner of the 1st Learning and Mining with Noisy Labels Challenge in IJCAI-ECAI 2022 (an informal technical report

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2207.10276

Last time updated on 28/09/2022