3,589 research outputs found
Uncertainty-Aware Multi-Shot Knowledge Distillation for Image-Based Object Re-Identification
Object re-identification (re-id) aims to identify a specific object across
times or camera views, with the person re-id and vehicle re-id as the most
widely studied applications. Re-id is challenging because of the variations in
viewpoints, (human) poses, and occlusions. Multi-shots of the same object can
cover diverse viewpoints/poses and thus provide more comprehensive information.
In this paper, we propose exploiting the multi-shots of the same identity to
guide the feature learning of each individual image. Specifically, we design an
Uncertainty-aware Multi-shot Teacher-Student (UMTS) Network. It consists of a
teacher network (T-net) that learns the comprehensive features from multiple
images of the same object, and a student network (S-net) that takes a single
image as input. In particular, we take into account the data dependent
heteroscedastic uncertainty for effectively transferring the knowledge from the
T-net to S-net. To the best of our knowledge, we are the first to make use of
multi-shots of an object in a teacher-student learning manner for effectively
boosting the single image based re-id. We validate the effectiveness of our
approach on the popular vehicle re-id and person re-id datasets. In inference,
the S-net alone significantly outperforms the baselines and achieves the
state-of-the-art performance.Comment: Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20
Recent Advances of Continual Learning in Computer Vision: An Overview
In contrast to batch learning where all training data is available at once,
continual learning represents a family of methods that accumulate knowledge and
learn continuously with data available in sequential order. Similar to the
human learning process with the ability of learning, fusing, and accumulating
new knowledge coming at different time steps, continual learning is considered
to have high practical significance. Hence, continual learning has been studied
in various artificial intelligence tasks. In this paper, we present a
comprehensive review of the recent progress of continual learning in computer
vision. In particular, the works are grouped by their representative
techniques, including regularization, knowledge distillation, memory,
generative replay, parameter isolation, and a combination of the above
techniques. For each category of these techniques, both its characteristics and
applications in computer vision are presented. At the end of this overview,
several subareas, where continuous knowledge accumulation is potentially
helpful while continual learning has not been well studied, are discussed
Few-shot Class-incremental Learning: A Survey
Few-shot Class-Incremental Learning (FSCIL) presents a unique challenge in
machine learning, as it necessitates the continuous learning of new classes
from sparse labeled training samples without forgetting previous knowledge.
While this field has seen recent progress, it remains an active area of
exploration. This paper aims to provide a comprehensive and systematic review
of FSCIL. In our in-depth examination, we delve into various facets of FSCIL,
encompassing the problem definition, the discussion of primary challenges of
unreliable empirical risk minimization and the stability-plasticity dilemma,
general schemes, and relevant problems of incremental learning and few-shot
learning. Besides, we offer an overview of benchmark datasets and evaluation
metrics. Furthermore, we introduce the classification methods in FSCIL from
data-based, structure-based, and optimization-based approaches and the object
detection methods in FSCIL from anchor-free and anchor-based approaches. Beyond
these, we illuminate several promising research directions within FSCIL that
merit further investigation
NCL++: Nested Collaborative Learning for Long-Tailed Visual Recognition
Long-tailed visual recognition has received increasing attention in recent
years. Due to the extremely imbalanced data distribution in long-tailed
learning, the learning process shows great uncertainties. For example, the
predictions of different experts on the same image vary remarkably despite the
same training settings. To alleviate the uncertainty, we propose a Nested
Collaborative Learning (NCL++) which tackles the long-tailed learning problem
by a collaborative learning. To be specific, the collaborative learning
consists of two folds, namely inter-expert collaborative learning (InterCL) and
intra-expert collaborative learning (IntraCL). In-terCL learns multiple experts
collaboratively and concurrently, aiming to transfer the knowledge among
different experts. IntraCL is similar to InterCL, but it aims to conduct the
collaborative learning on multiple augmented copies of the same image within
the single expert. To achieve the collaborative learning in long-tailed
learning, the balanced online distillation is proposed to force the consistent
predictions among different experts and augmented copies, which reduces the
learning uncertainties. Moreover, in order to improve the meticulous
distinguishing ability on the confusing categories, we further propose a Hard
Category Mining (HCM), which selects the negative categories with high
predicted scores as the hard categories. Then, the collaborative learning is
formulated in a nested way, in which the learning is conducted on not just all
categories from a full perspective but some hard categories from a partial
perspective. Extensive experiments manifest the superiority of our method with
outperforming the state-of-the-art whether with using a single model or an
ensemble. The code will be publicly released.Comment: arXiv admin note: text overlap with arXiv:2203.1535
V2X-AHD:Vehicle-to-Everything Cooperation Perception via Asymmetric Heterogenous Distillation Network
Object detection is the central issue of intelligent traffic systems, and
recent advancements in single-vehicle lidar-based 3D detection indicate that it
can provide accurate position information for intelligent agents to make
decisions and plan. Compared with single-vehicle perception, multi-view
vehicle-road cooperation perception has fundamental advantages, such as the
elimination of blind spots and a broader range of perception, and has become a
research hotspot. However, the current perception of cooperation focuses on
improving the complexity of fusion while ignoring the fundamental problems
caused by the absence of single-view outlines. We propose a multi-view
vehicle-road cooperation perception system, vehicle-to-everything cooperative
perception (V2X-AHD), in order to enhance the identification capability,
particularly for predicting the vehicle's shape. At first, we propose an
asymmetric heterogeneous distillation network fed with different training data
to improve the accuracy of contour recognition, with multi-view teacher
features transferring to single-view student features. While the point cloud
data are sparse, we propose Spara Pillar, a spare convolutional-based plug-in
feature extraction backbone, to reduce the number of parameters and improve and
enhance feature extraction capabilities. Moreover, we leverage the multi-head
self-attention (MSA) to fuse the single-view feature, and the lightweight
design makes the fusion feature a smooth expression. The results of applying
our algorithm to the massive open dataset V2Xset demonstrate that our method
achieves the state-of-the-art result. The V2X-AHD can effectively improve the
accuracy of 3D object detection and reduce the number of network parameters,
according to this study, which serves as a benchmark for cooperative
perception. The code for this article is available at
https://github.com/feeling0414-lab/V2X-AHD
Label-Efficient Deep Learning in Medical Image Analysis: Challenges and Future Directions
Deep learning has seen rapid growth in recent years and achieved
state-of-the-art performance in a wide range of applications. However, training
models typically requires expensive and time-consuming collection of large
quantities of labeled data. This is particularly true within the scope of
medical imaging analysis (MIA), where data are limited and labels are expensive
to be acquired. Thus, label-efficient deep learning methods are developed to
make comprehensive use of the labeled data as well as the abundance of
unlabeled and weak-labeled data. In this survey, we extensively investigated
over 300 recent papers to provide a comprehensive overview of recent progress
on label-efficient learning strategies in MIA. We first present the background
of label-efficient learning and categorize the approaches into different
schemes. Next, we examine the current state-of-the-art methods in detail
through each scheme. Specifically, we provide an in-depth investigation,
covering not only canonical semi-supervised, self-supervised, and
multi-instance learning schemes, but also recently emerged active and
annotation-efficient learning strategies. Moreover, as a comprehensive
contribution to the field, this survey not only elucidates the commonalities
and unique features of the surveyed methods but also presents a detailed
analysis of the current challenges in the field and suggests potential avenues
for future research.Comment: Update Few-shot Method
Transformer for Object Re-Identification: A Survey
Object Re-Identification (Re-ID) aims to identify and retrieve specific
objects from varying viewpoints. For a prolonged period, this field has been
predominantly driven by deep convolutional neural networks. In recent years,
the Transformer has witnessed remarkable advancements in computer vision,
prompting an increasing body of research to delve into the application of
Transformer in Re-ID. This paper provides a comprehensive review and in-depth
analysis of the Transformer-based Re-ID. In categorizing existing works into
Image/Video-Based Re-ID, Re-ID with limited data/annotations, Cross-Modal
Re-ID, and Special Re-ID Scenarios, we thoroughly elucidate the advantages
demonstrated by the Transformer in addressing a multitude of challenges across
these domains. Considering the trending unsupervised Re-ID, we propose a new
Transformer baseline, UntransReID, achieving state-of-the-art performance on
both single-/cross modal tasks. Besides, this survey also covers a wide range
of Re-ID research objects, including progress in animal Re-ID. Given the
diversity of species in animal Re-ID, we devise a standardized experimental
benchmark and conduct extensive experiments to explore the applicability of
Transformer for this task to facilitate future research. Finally, we discuss
some important yet under-investigated open issues in the big foundation model
era, we believe it will serve as a new handbook for researchers in this field
Low-Power Computer Vision: Improve the Efficiency of Artificial Intelligence
Energy efficiency is critical for running computer vision on battery-powered systems, such as mobile phones or UAVs (unmanned aerial vehicles, or drones). This book collects the methods that have won the annual IEEE Low-Power Computer Vision Challenges since 2015. The winners share their solutions and provide insight on how to improve the efficiency of machine learning systems
- …