2 research outputs found
FDCNet: Feature Drift Compensation Network for Class-Incremental Weakly Supervised Object Localization
This work addresses the task of class-incremental weakly supervised object
localization (CI-WSOL). The goal is to incrementally learn object localization
for novel classes using only image-level annotations while retaining the
ability to localize previously learned classes. This task is important because
annotating bounding boxes for every new incoming data is expensive, although
object localization is crucial in various applications. To the best of our
knowledge, we are the first to address this task. Thus, we first present a
strong baseline method for CI-WSOL by adapting the strategies of
class-incremental classifiers to mitigate catastrophic forgetting. These
strategies include applying knowledge distillation, maintaining a small data
set from previous tasks, and using cosine normalization. We then propose the
feature drift compensation network to compensate for the effects of feature
drifts on class scores and localization maps. Since updating network parameters
to learn new tasks causes feature drifts, compensating for the final outputs is
necessary. Finally, we evaluate our proposed method by conducting experiments
on two publicly available datasets (ImageNet-100 and CUB-200). The experimental
results demonstrate that the proposed method outperforms other baseline
methods.Comment: ACM Multimedia 202
Recommended from our members
Data- and compute-efficient visual recognition and generation
The remarkable advancements in deep learning for visual recognition and generation have often been accompanied by a significant computational burden. As the complexity of deep learning models escalates, achieving efficiency in both architecture construction and data utilization becomes paramount. This dissertation examines two fundamental categories of efficiency: model efficiency and data efficiency. 1. Model Efficiency: This facet of the study focuses on reducing the computational cost of deep neural networks without compromising performance. Through neural architecture search (NAS), we discover highly efficient models tailored for video action recognition. Our novel approach to multi-stream multivariate search space has led to the discovery of two-stream models like Auto-TSNet, dramatically reducing FLOPs and improving accuracy over standard benchmarks. 2. Data Efficiency: Data efficiency in deep learning relates to the model’s capacity to learn effectively from a limited dataset. This characteristic is particularly valuable when gathering or labeling extensive data is either prohibitive or infeasible. Specifically, the dissertation focuses on data efficiency for the downstream task generalization of pre-trained models, recognizing their significant role in advancing the field. Our study addresses two main challenges within this domain: (a) Incremental Few-shot Learning (IFL): IFL represents a nuanced challenge in deep learning, requiring the model to learn new categories using few examples, without forgetting previously learned information. In the context of this dissertation, we investigate IFL in two essential domains: object detection and image generation. For object detection, we introduce a weakly supervised approach, WS-iFSD, that substantially augments meta-training, outperforming existing methods across key benchmarks. In image generation, we propose EI-GAN, an efficient generative model that incrementally registers new categories without revisiting extra data or experiencing catastrophic forgetting. Together, we demonstrate significant advancements in the ability to learn and generalize from limited data. (b) Multimodal Generalization (MMG): MMG is a novel focus in this dissertation, addressing how systems adapt when certain modalities are limited or absent. Specifically, we introduce two unique evaluation methods: 1) Missing Modality Evaluation, which tests the system’s ability to function without some modalities present during training, and 2) Cross-modal Zero-shot Evaluation, which evaluates performance when the training and inference modalities are entirely disjoint. Our exploration of these challenges, along with the creation of new models and a dataset, MMG-Ego4D, highlights our emphasis on the efficiency of generalization, contributing vital insights to the field of multimodal learning and adaptation. The intertwined exploration of model and data efficiency contributes new methodologies and constructs a deeper understanding of efficiency in deep learning. By bridging the gap between high performance and computational frugality, this dissertation paves the way for more sustainable and adaptable deep learning applications in the fields of visual recognition and generation.Electrical and Computer Engineerin