4 research outputs found
Learning Adaptive Loss for Robust Learning with Noisy Labels
Robust loss minimization is an important strategy for handling robust
learning issue on noisy labels. Current robust loss functions, however,
inevitably involve hyperparameter(s) to be tuned, manually or heuristically
through cross validation, which makes them fairly hard to be generally applied
in practice. Besides, the non-convexity brought by the loss as well as the
complicated network architecture makes it easily trapped into an unexpected
solution with poor generalization capability. To address above issues, we
propose a meta-learning method capable of adaptively learning hyperparameter in
robust loss functions. Specifically, through mutual amelioration between robust
loss hyperparameter and network parameters in our method, both of them can be
simultaneously finely learned and coordinated to attain solutions with good
generalization capability. Four kinds of SOTA robust loss functions are
attempted to be integrated into our algorithm, and comprehensive experiments
substantiate the general availability and effectiveness of the proposed method
in both its accuracy and generalization performance, as compared with
conventional hyperparameter tuning strategy, even with carefully tuned
hyperparameters.Comment: 10page
A Second-Order Approach to Learning with Instance-Dependent Label Noise
The presence of label noise often misleads the training of deep neural
networks. Departing from the recent literature which largely assumes the label
noise rate is only determined by the true label class, the errors in
human-annotated labels are more likely to be dependent on the difficulty levels
of tasks, resulting in settings with instance-dependent label noise. We first
provide evidences that the heterogeneous instance-dependent label noise is
effectively down-weighting the examples with higher noise rates in a
non-uniform way and thus causes imbalances, rendering the strategy of directly
applying methods for class-dependent label noise questionable. Built on a
recent work peer loss [24], we then propose and study the potentials of a
second-order approach that leverages the estimation of several covariance terms
defined between the instance-dependent noise rates and the Bayes optimal label.
We show that this set of second-order statistics successfully captures the
induced imbalances. We further proceed to show that with the help of the
estimated second-order statistics, we identify a new loss function whose
expected risk of a classifier under instance-dependent label noise is
equivalent to a new problem with only class-dependent label noise. This fact
allows us to apply existing solutions to handle this better-studied setting. We
provide an efficient procedure to estimate these second-order statistics
without accessing either ground truth labels or prior knowledge of the noise
rates. Experiments on CIFAR10 and CIFAR100 with synthetic instance-dependent
label noise and Clothing1M with real-world human label noise verify our
approach. Our implementation is available at https://github.com/UCSC-REAL/CAL.Comment: Learning with label noise. Accepted as an oral paper by CVPR 202
Learning with Instance-Dependent Label Noise: A Sample Sieve Approach
Human-annotated labels are often prone to noise, and the presence of such
noise will degrade the performance of the resulting deep neural network (DNN)
models. Much of the literature (with several recent exceptions) of learning
with noisy labels focuses on the case when the label noise is independent of
features. Practically, annotations errors tend to be instance-dependent and
often depend on the difficulty levels of recognizing a certain task. Applying
existing results from instance-independent settings would require a significant
amount of estimation of noise rates. Therefore, providing theoretically
rigorous solutions for learning with instance-dependent label noise remains a
challenge. In this paper, we propose CORES (COnfidence REgularized Sample
Sieve), which progressively sieves out corrupted examples. The implementation
of CORES does not require specifying noise rates and yet we are able to
provide theoretical guarantees of CORES in filtering out the corrupted
examples. This high-quality sample sieve allows us to treat clean examples and
the corrupted ones separately in training a DNN solution, and such a separation
is shown to be advantageous in the instance-dependent noise setting. We
demonstrate the performance of CORES on CIFAR10 and CIFAR100 datasets
with synthetic instance-dependent label noise and Clothing1M with real-world
human noise. As of independent interests, our sample sieve provides a generic
machinery for anatomizing noisy datasets and provides a flexible interface for
various robust training techniques to further improve the performance. Code is
available at https://github.com/UCSC-REAL/cores.Comment: ICLR 202
Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize
The learning rate (LR) is one of the most important hyper-parameters in
stochastic gradient descent (SGD) for deep neural networks (DNNs) training and
generalization. However, current hand-designed LR schedules need to manually
pre-specify schedule as well as its extra hyper-parameters, which limits its
ability to adapt non-convex optimization problems due to the significant
variation of training dynamic. To address this issue, we propose a model
capable of adaptively learning LR schedule from data. We specifically design a
meta-learner with explicit mapping formulation to parameterize LR schedules,
which can adjust LR adaptively to comply with current training dynamic by
leveraging the information from past training histories. Image and text
classification benchmark experiments substantiate the capability of our method
for achieving proper LR schedules compared with baseline methods. Moreover, we
transfer the learned LR schedule to other various tasks, like different
training batch sizes, epochs, datasets, network architectures, especially large
scale ImageNet dataset, showing its stronger generalization capability than
related methods. Finally, guided by a small set of clean validation set, we
show our method can achieve better generalization error when training data is
biased with corrupted noise than baseline methods.Comment: 21 page