Semi-supervised learning (SSL) has witnessed great progress with various
improvements in the self-training framework with pseudo labeling. The main
challenge is how to distinguish high-quality pseudo labels against the
confirmation bias. However, existing pseudo-label selection strategies are
limited to pre-defined schemes or complex hand-crafted policies specially
designed for classification, failing to achieve high-quality labels, fast
convergence, and task versatility simultaneously. To these ends, we propose a
Semi-supervised Reward framework (SemiReward) that predicts reward scores to
evaluate and filter out high-quality pseudo labels, which is pluggable to
mainstream SSL methods in wide task types and scenarios. To mitigate
confirmation bias, SemiReward is trained online in two stages with a generator
model and subsampling strategy. With classification and regression tasks on 13
standard SSL benchmarks across three modalities, extensive experiments verify
that SemiReward achieves significant performance gains and faster convergence
speeds upon Pseudo Label, FlexMatch, and Free/SoftMatch. Code and models are
available at https://github.com/Westlake-AI/SemiReward.Comment: ICLR 2024 Camera Ready. Preprint V2 (25 pages) with the source code
at https://github.com/Westlake-AI/SemiRewar