Search CORE

2 research outputs found

FDCNet: Feature Drift Compensation Network for Class-Incremental Weakly Supervised Object Localization

Author: Kang Byeongkeun
Lee Taehyung
Lee Yeejin
Park Sejin
Publication venue
Publication date: 16/09/2023
Field of study

This work addresses the task of class-incremental weakly supervised object localization (CI-WSOL). The goal is to incrementally learn object localization for novel classes using only image-level annotations while retaining the ability to localize previously learned classes. This task is important because annotating bounding boxes for every new incoming data is expensive, although object localization is crucial in various applications. To the best of our knowledge, we are the first to address this task. Thus, we first present a strong baseline method for CI-WSOL by adapting the strategies of class-incremental classifiers to mitigate catastrophic forgetting. These strategies include applying knowledge distillation, maintaining a small data set from previous tasks, and using cosine normalization. We then propose the feature drift compensation network to compensate for the effects of feature drifts on class scores and localization maps. Since updating network parameters to learn new tasks causes feature drifts, compensating for the final outputs is necessary. Finally, we evaluate our proposed method by conducting experiments on two publicly available datasets (ImageNet-100 and CUB-200). The experimental results demonstrate that the proposed method outperforms other baseline methods.Comment: ACM Multimedia 202

arXiv.org e-Print Archive

Recommended from our members

Data- and compute-efficient visual recognition and generation

Author: Gong Xinyu
Publication venue
Publication date: 09/07/2024
Field of study

The remarkable advancements in deep learning for visual recognition and generation have often been accompanied by a significant computational burden. As the complexity of deep learning models escalates, achieving efficiency in both architecture construction and data utilization becomes paramount. This dissertation examines two fundamental categories of efficiency: model efficiency and data efficiency. 1. Model Efficiency: This facet of the study focuses on reducing the computational cost of deep neural networks without compromising performance. Through neural architecture search (NAS), we discover highly efficient models tailored for video action recognition. Our novel approach to multi-stream multivariate search space has led to the discovery of two-stream models like Auto-TSNet, dramatically reducing FLOPs and improving accuracy over standard benchmarks. 2. Data Efficiency: Data efficiency in deep learning relates to the model’s capacity to learn effectively from a limited dataset. This characteristic is particularly valuable when gathering or labeling extensive data is either prohibitive or infeasible. Specifically, the dissertation focuses on data efficiency for the downstream task generalization of pre-trained models, recognizing their significant role in advancing the field. Our study addresses two main challenges within this domain: (a) Incremental Few-shot Learning (IFL): IFL represents a nuanced challenge in deep learning, requiring the model to learn new categories using few examples, without forgetting previously learned information. In the context of this dissertation, we investigate IFL in two essential domains: object detection and image generation. For object detection, we introduce a weakly supervised approach, WS-iFSD, that substantially augments meta-training, outperforming existing methods across key benchmarks. In image generation, we propose EI-GAN, an efficient generative model that incrementally registers new categories without revisiting extra data or experiencing catastrophic forgetting. Together, we demonstrate significant advancements in the ability to learn and generalize from limited data. (b) Multimodal Generalization (MMG): MMG is a novel focus in this dissertation, addressing how systems adapt when certain modalities are limited or absent. Specifically, we introduce two unique evaluation methods: 1) Missing Modality Evaluation, which tests the system’s ability to function without some modalities present during training, and 2) Cross-modal Zero-shot Evaluation, which evaluates performance when the training and inference modalities are entirely disjoint. Our exploration of these challenges, along with the creation of new models and a dataset, MMG-Ego4D, highlights our emphasis on the efficiency of generalization, contributing vital insights to the field of multimodal learning and adaptation. The intertwined exploration of model and data efficiency contributes new methodologies and constructs a deeper understanding of efficiency in deep learning. By bridging the gap between high performance and computational frugality, this dissertation paves the way for more sustainable and adaptable deep learning applications in the fields of visual recognition and generation.Electrical and Computer Engineerin

Texas ScholarWorks