21,060 research outputs found
NCL++: Nested Collaborative Learning for Long-Tailed Visual Recognition
Long-tailed visual recognition has received increasing attention in recent
years. Due to the extremely imbalanced data distribution in long-tailed
learning, the learning process shows great uncertainties. For example, the
predictions of different experts on the same image vary remarkably despite the
same training settings. To alleviate the uncertainty, we propose a Nested
Collaborative Learning (NCL++) which tackles the long-tailed learning problem
by a collaborative learning. To be specific, the collaborative learning
consists of two folds, namely inter-expert collaborative learning (InterCL) and
intra-expert collaborative learning (IntraCL). In-terCL learns multiple experts
collaboratively and concurrently, aiming to transfer the knowledge among
different experts. IntraCL is similar to InterCL, but it aims to conduct the
collaborative learning on multiple augmented copies of the same image within
the single expert. To achieve the collaborative learning in long-tailed
learning, the balanced online distillation is proposed to force the consistent
predictions among different experts and augmented copies, which reduces the
learning uncertainties. Moreover, in order to improve the meticulous
distinguishing ability on the confusing categories, we further propose a Hard
Category Mining (HCM), which selects the negative categories with high
predicted scores as the hard categories. Then, the collaborative learning is
formulated in a nested way, in which the learning is conducted on not just all
categories from a full perspective but some hard categories from a partial
perspective. Extensive experiments manifest the superiority of our method with
outperforming the state-of-the-art whether with using a single model or an
ensemble. The code will be publicly released.Comment: arXiv admin note: text overlap with arXiv:2203.1535
Towards Long-Tailed Recognition for Graph Classification via Collaborative Experts
Graph classification, aiming at learning the graph-level representations for
effective class assignments, has received outstanding achievements, which
heavily relies on high-quality datasets that have balanced class distribution.
In fact, most real-world graph data naturally presents a long-tailed form,
where the head classes occupy much more samples than the tail classes, it thus
is essential to study the graph-level classification over long-tailed data
while still remaining largely unexplored. However, most existing long-tailed
learning methods in visions fail to jointly optimize the representation
learning and classifier training, as well as neglect the mining of the
hard-to-classify classes. Directly applying existing methods to graphs may lead
to sub-optimal performance, since the model trained on graphs would be more
sensitive to the long-tailed distribution due to the complex topological
characteristics. Hence, in this paper, we propose a novel long-tailed
graph-level classification framework via Collaborative Multi-expert Learning
(CoMe) to tackle the problem. To equilibrate the contributions of head and tail
classes, we first develop balanced contrastive learning from the view of
representation learning, and then design an individual-expert classifier
training based on hard class mining. In addition, we execute gated fusion and
disentangled knowledge distillation among the multiple experts to promote the
collaboration in a multi-expert framework. Comprehensive experiments are
performed on seven widely-used benchmark datasets to demonstrate the
superiority of our method CoMe over state-of-the-art baselines.Comment: Accepted by IEEE Transactions on Big Data (TBD 2024
Learning From Multiple Experts:Self-paced Knowledge Distillation for Long-Tailed Classification
In real-world scenarios, data tends to exhibit a long-tailed distribution,
which increases the difficulty of training deep networks. In this paper, we
propose a novel self-paced knowledge distillation framework, termed Learning
From Multiple Experts (LFME). Our method is inspired by the observation that
networks trained on less imbalanced subsets of the distribution often yield
better performances than their jointly-trained counterparts. We refer to these
models as 'Experts', and the proposed LFME framework aggregates the knowledge
from multiple 'Experts' to learn a unified student model. Specifically, the
proposed framework involves two levels of adaptive learning schedules:
Self-paced Expert Selection and Curriculum Instance Selection, so that the
knowledge is adaptively transferred to the 'Student'. We conduct extensive
experiments and demonstrate that our method is able to achieve superior
performances compared to state-of-the-art methods. We also show that our method
can be easily plugged into state-of-the-art long-tailed classification
algorithms for further improvements.Comment: ECCV 2020 Spotligh
MDCS: More Diverse Experts with Consistency Self-distillation for Long-tailed Recognition
Recently, multi-expert methods have led to significant improvements in
long-tail recognition (LTR). We summarize two aspects that need further
enhancement to contribute to LTR boosting: (1) More diverse experts; (2) Lower
model variance. However, the previous methods didn't handle them well. To this
end, we propose More Diverse experts with Consistency Self-distillation (MDCS)
to bridge the gap left by earlier methods. Our MDCS approach consists of two
core components: Diversity Loss (DL) and Consistency Self-distillation (CS). In
detail, DL promotes diversity among experts by controlling their focus on
different categories. To reduce the model variance, we employ KL divergence to
distill the richer knowledge of weakly augmented instances for the experts'
self-distillation. In particular, we design Confident Instance Sampling (CIS)
to select the correctly classified instances for CS to avoid biased/noisy
knowledge. In the analysis and ablation study, we demonstrate that our method
compared with previous work can effectively increase the diversity of experts,
significantly reduce the variance of the model, and improve recognition
accuracy. Moreover, the roles of our DL and CS are mutually reinforcing and
coupled: the diversity of experts benefits from the CS, and the CS cannot
achieve remarkable results without the DL. Experiments show our MDCS
outperforms the state-of-the-art by 1% 2% on five popular long-tailed
benchmarks, including CIFAR10-LT, CIFAR100-LT, ImageNet-LT, Places-LT, and
iNaturalist 2018. The code is available at https://github.com/fistyee/MDCS.Comment: ICCV2023 Accept. 13 page
Learning Prototype Classifiers for Long-Tailed Recognition
The problem of long-tailed recognition (LTR) has received attention in recent
years due to the fundamental power-law distribution of objects in the
real-world. Most recent works in LTR use softmax classifiers that have a
tendency to correlate classifier norm with the amount of training data for a
given class. On the other hand, Prototype classifiers do not suffer from this
shortcoming and can deliver promising results simply using Nearest-Class-Mean
(NCM), a special case where prototypes are empirical centroids. However, the
potential of Prototype classifiers as an alternative to softmax in LTR is
relatively underexplored. In this work, we propose Prototype classifiers, which
jointly learn prototypes that minimize average cross-entropy loss based on
probability scores from distances to prototypes. We theoretically analyze the
properties of Euclidean distance based prototype classifiers that leads to
stable gradient-based optimization which is robust to outliers. We further
enhance Prototype classifiers by learning channel-dependent temperature
parameters to enable independent distance scales along each channel. Our
analysis shows that prototypes learned by Prototype classifiers are better
separated than empirical centroids. Results on four long-tailed recognition
benchmarks show that Prototype classifier outperforms or is comparable to the
state-of-the-art methods.Comment: Accepted at IJCAI-2
Supervised Contrastive Learning on Blended Images for Long-tailed Recognition
Real-world data often have a long-tailed distribution, where the number of
samples per class is not equal over training classes. The imbalanced data form
a biased feature space, which deteriorates the performance of the recognition
model. In this paper, we propose a novel long-tailed recognition method to
balance the latent feature space. First, we introduce a MixUp-based data
augmentation technique to reduce the bias of the long-tailed data. Furthermore,
we propose a new supervised contrastive learning method, named Supervised
contrastive learning on Mixed Classes (SMC), for blended images. SMC creates a
set of positives based on the class labels of the original images. The
combination ratio of positives weights the positives in the training loss. SMC
with the class-mixture-based loss explores more diverse data space, enhancing
the generalization capability of the model. Extensive experiments on various
benchmarks show the effectiveness of our one-stage training method
Class Instance Balanced Learning for Long-Tailed Classification
The long-tailed image classification task remains important in the
development of deep neural networks as it explicitly deals with large
imbalances in the class frequencies of the training data. While uncommon in
engineered datasets, this imbalance is almost always present in real-world
data. Previous approaches have shown that combining cross-entropy and
contrastive learning can improve performance on the long-tailed task, but they
do not explore the tradeoff between head and tail classes. We propose a novel
class instance balanced loss (CIBL), which reweights the relative contributions
of a cross-entropy and a contrastive loss as a function of the frequency of
class instances in the training batch. This balancing favours the contrastive
loss for more common classes, leading to a learned classifier with a more
balanced performance across all class frequencies. Furthermore, increasing the
relative weight on the contrastive head shifts performance from common (head)
to rare (tail) classes, allowing the user to skew the performance towards these
classes if desired. We also show that changing the linear classifier head with
a cosine classifier yields a network that can be trained to similar performance
in substantially fewer epochs. We obtain competitive results on both
CIFAR-100-LT and ImageNet-LT.Comment: 8 pages, 2 figures, presented at the Conference on Robots and Vision
202
- …