We investigate the problem of learning with noisy labels in real-world
annotation scenarios, where noise can be categorized into two types: factual
noise and ambiguity noise. To better distinguish these noise types and utilize
their semantics, we propose a novel sample selection-based approach for noisy
label learning, called Proto-semi. Proto-semi initially divides all samples
into the confident and unconfident datasets via warm-up. By leveraging the
confident dataset, prototype vectors are constructed to capture class
characteristics. Subsequently, the distances between the unconfident samples
and the prototype vectors are calculated to facilitate noise classification.
Based on these distances, the labels are either corrected or retained,
resulting in the refinement of the confident and unconfident datasets. Finally,
we introduce a semi-supervised learning method to enhance training. Empirical
evaluations on a real-world annotated dataset substantiate the robustness of
Proto-semi in handling the problem of learning from noisy labels. Meanwhile,
the prototype-based repartitioning strategy is shown to be effective in
mitigating the adverse impact of label noise. Our code and data are available
at https://github.com/fuxiAIlab/ProtoSemi