8 research outputs found
On the Importance of Feature Separability in Predicting Out-Of-Distribution Error
Estimating the generalization performance is practically challenging on
out-of-distribution (OOD) data without ground truth labels. While previous
methods emphasize the connection between distribution difference and OOD
accuracy, we show that a large domain gap not necessarily leads to a low test
accuracy. In this paper, we investigate this problem from the perspective of
feature separability, and propose a dataset-level score based upon feature
dispersion to estimate the test accuracy under distribution shift. Our method
is inspired by desirable properties of features in representation learning:
high inter-class dispersion and high intra-class compactness. Our analysis
shows that inter-class dispersion is strongly correlated with the model
accuracy, while intra-class compactness does not reflect the generalization
performance on OOD data. Extensive experiments demonstrate the superiority of
our method in both prediction performance and computational efficiency
Leveraging Gradients for Unsupervised Accuracy Estimation under Distribution Shift
Estimating test accuracy without access to the ground-truth test labels under
varying test environments is a challenging, yet extremely important problem in
the safe deployment of machine learning algorithms. Existing works rely on the
information from either the outputs or the extracted features of neural
networks to formulate an estimation score correlating with the ground-truth
test accuracy. In this paper, we investigate--both empirically and
theoretically--how the information provided by the gradients can be predictive
of the ground-truth test accuracy even under a distribution shift.
Specifically, we use the norm of classification-layer gradients, backpropagated
from the cross-entropy loss after only one gradient step over test data. Our
key idea is that the model should be adjusted with a higher magnitude of
gradients when it does not generalize to the test dataset with a distribution
shift. We provide theoretical insights highlighting the main ingredients of
such an approach ensuring its empirical success. Extensive experiments
conducted on diverse distribution shifts and model structures demonstrate that
our method significantly outperforms state-of-the-art algorithms
ACIL: Analytic Class-Incremental Learning with Absolute Memorization and Privacy Protection
Class-incremental learning (CIL) learns a classification model with training
data of different classes arising progressively. Existing CIL either suffers
from serious accuracy loss due to catastrophic forgetting, or invades data
privacy by revisiting used exemplars. Inspired by linear learning formulations,
we propose an analytic class-incremental learning (ACIL) with absolute
memorization of past knowledge while avoiding breaching of data privacy (i.e.,
without storing historical data). The absolute memorization is demonstrated in
the sense that class-incremental learning using ACIL given present data would
give identical results to that from its joint-learning counterpart which
consumes both present and historical samples. This equality is theoretically
validated. Data privacy is ensured since no historical data are involved during
the learning process. Empirical validations demonstrate ACIL's competitive
accuracy performance with near-identical results for various incremental task
settings (e.g., 5-50 phases). This also allows ACIL to outperform the
state-of-the-art methods for large-phase scenarios (e.g., 25 and 50 phases).Comment: published in NeurIPS 202
Mitigating Memorization of Noisy Labels by Clipping the Model Prediction
In the presence of noisy labels, designing robust loss functions is critical
for securing the generalization performance of deep neural networks. Cross
Entropy (CE) loss has been shown to be not robust to noisy labels due to its
unboundedness. To alleviate this issue, existing works typically design
specialized robust losses with the symmetric condition, which usually lead to
the underfitting issue. In this paper, our key idea is to induce a loss bound
at the logit level, thus universally enhancing the noise robustness of existing
losses. Specifically, we propose logit clipping (LogitClip), which clamps the
norm of the logit vector to ensure that it is upper bounded by a constant. In
this manner, CE loss equipped with our LogitClip method is effectively bounded,
mitigating the overfitting to examples with noisy labels. Moreover, we present
theoretical analyses to certify the noise-tolerant ability of LogitClip.
Extensive experiments show that LogitClip not only significantly improves the
noise robustness of CE loss, but also broadly enhances the generalization
performance of popular robust losses.Comment: Accepted by ICML 202
Automatic online multi-source domain adaptation
Knowledge transfer across several streaming processes remain challenging problem not only because of different distributions of each stream but also because of rapidly changing and never-ending environments of data streams. Albeit growing research achievements in this area, most of existing works are developed for a single source domain which limits its resilience to exploit multi-source domains being beneficial to recover from concept drifts quickly and to avoid the negative transfer problem. An online domain adaptation technique under multi-source streaming processes, namely automatic online multi-source domain adaptation (AOMSDA), is proposed in this paper. The online domain adaptation strategy of AOMSDA is formulated under a coupled generative and discriminative approach of denoising autoencoder (DAE) where the central moment discrepancy (CMD)-based regularizer is integrated to handle the existence of multi-source domains thereby taking advantage of complementary information sources. The asynchronous concept drifts taking place at different time periods are addressed by a self-organizing structure and a node re-weighting strategy. Our numerical study demonstrates that AOMSDA is capable of outperforming its counterparts in 5 of 8 study cases while the ablation study depicts the advantage of each learning component. In addition, AOMSDA is general for any number of source streams. The source code of AOMSDA is shared publicly in https://github.com/Renchunzi-Xie/AOMSDA.git.Ministry of Education (MOE)This work is supported by Ministry of Education, Republic of Singapore, Tier 1 Grant
GearNet: Stepwise Dual Learning for Weakly Supervised Domain Adaptation
This paper studies a weakly supervised domain adaptation (WSDA) problem, where we only have access to the source domain with noisy labels, from which we need to transfer useful information to the unlabeled target domain. Although there have been a few studies on this problem, most of them only exploit unidirectional relationships from the source domain to the target domain. In this paper, we propose a universal paradigm called GearNet to exploit bilateral relationships between the two domains. Specifically, we take the two domains as different inputs to train two models alternately, and a symmetrical Kullback-Leibler loss is used for selectively matching the predictions of the two models in the same domain. This interactive learning schema enables implicit label noise canceling and exploit correlations between the source and target domains. Therefore, our GearNet has the great potential to boost the performance of a wide range of existing WSDA methods. Comprehensive experimental results show that the performance of existing methods can be significantly improved by equipping with our GearNet