9,266 research outputs found
Exemplar-supported representation for effective class-incremental learning
Catastrophic forgetting is a key challenge for class-incremental learning with deep neural networks, where the performance decreases considerably while dealing with long sequences of new classes. To tackle this issue, in this paper, we propose a new exemplar-supported representation for incremental learning (ESRIL) approach that consists of three components. First, we use memory aware synapses (MAS) pre-trained on the ImageNet to retain the ability of robust representation learning and classification for old classes from the perspective of the model. Second, exemplar-based subspace clustering (ESC) is utilized to construct the exemplar set, which can keep the performance from various views of the data. Third, the nearest class multiple centroids (NCMC) is used as the classifier to save the training cost of the fully connected layer of MAS when the criterion is met. Intensive experiments and analyses are presented to show the influence of various backbone structures and the effectiveness of different components in our model. Experiments on several general-purpose and fine-grained image recognition datasets have fully demonstrated the efficacy of the proposed methodology
PointMap: A real-time memory-based learning system with on-line and post-training pruning
Also published in the International Journal of Hybrid Intelligent Systems, Volume 1, January, 2004A memory-based learning system called PointMap is a simple and computationally efficient extension of Condensed Nearest Neighbor that allows the user to limit the number of exemplars stored during incremental learning. PointMap evaluates the information value of coding nodes during training, and uses this index to prune uninformative nodes either on-line or after training. These pruning methods allow the user to control both a priori code size and sensitivity to detail in the training data, as well as to determine the code size necessary for accurate performance on a given data set. Coding and pruning computations are local in space, with only the nearest coded neighbor available for comparison with the input; and in time, with only the current input available during coding. Pruning helps solve common problems of traditional memory-based learning systems: large memory requirements, their accompanying slow on-line computations, and sensitivity to noise. PointMap copes with the curse of dimensionality by considering multiple nearest neighbors during testing without increasing the complexity of the training process or the stored code. The performance of PointMap is compared to that of a group of sixteen nearest-neighbor systems on benchmark problems.This research was supported by grants from the Air Force Office of Scientific Research (AFOSR F49620-98-l-0108, F49620-0l-l-0397, and F49620-0l-l-0423)
and the Office of Naval Research (ONR N00014-0l-l-0624)
Enhancement of ELDA Tracker Based on CNN Features and Adaptive Model Update
Appearance representation and the observation model are the most important components in designing a robust visual tracking algorithm for video-based sensors. Additionally, the exemplar-based linear discriminant analysis (ELDA) model has shown good performance in object tracking. Based on that, we improve the ELDA tracking algorithm by deep convolutional neural network (CNN) features and adaptive model update. Deep CNN features have been successfully used in various computer vision tasks. Extracting CNN features on all of the candidate windows is time consuming. To address this problem, a two-step CNN feature extraction method is proposed by separately computing convolutional layers and fully-connected layers. Due to the strong discriminative ability of CNN features and the exemplar-based model, we update both object and background models to improve their adaptivity and to deal with the tradeoff between discriminative ability and adaptivity. An object updating method is proposed to select the “good” models (detectors), which are quite discriminative and uncorrelated to other selected models. Meanwhile, we build the background model as a Gaussian mixture model (GMM) to adapt to complex scenes, which is initialized offline and updated online. The proposed tracker is evaluated on a benchmark dataset of 50 video sequences with various challenges. It achieves the best overall performance among the compared state-of-the-art trackers, which demonstrates the effectiveness and robustness of our tracking algorithm
A Model or 603 Exemplars: Towards Memory-Efficient Class-Incremental Learning
Real-world applications require the classification model to adapt to new
classes without forgetting old ones. Correspondingly, Class-Incremental
Learning (CIL) aims to train a model with limited memory size to meet this
requirement. Typical CIL methods tend to save representative exemplars from
former classes to resist forgetting, while recent works find that storing
models from history can substantially boost the performance. However, the
stored models are not counted into the memory budget, which implicitly results
in unfair comparisons. We find that when counting the model size into the total
budget and comparing methods with aligned memory size, saving models do not
consistently work, especially for the case with limited memory budgets. As a
result, we need to holistically evaluate different CIL methods at different
memory scales and simultaneously consider accuracy and memory size for
measurement. On the other hand, we dive deeply into the construction of the
memory buffer for memory efficiency. By analyzing the effect of different
layers in the network, we find that shallow and deep layers have different
characteristics in CIL. Motivated by this, we propose a simple yet effective
baseline, denoted as MEMO for Memory-efficient Expandable MOdel. MEMO extends
specialized layers based on the shared generalized representations, efficiently
extracting diverse representations with modest cost and maintaining
representative exemplars. Extensive experiments on benchmark datasets validate
MEMO's competitive performance. Code is available at:
https://github.com/wangkiw/ICLR23-MEMOComment: Accepted to ICLR 2023 as a Spotlight Presentation. Code is available
at: https://github.com/wangkiw/ICLR23-MEM
Does Continual Learning = Catastrophic Forgetting?
Continual learning is known for suffering from catastrophic forgetting, a
phenomenon where earlier learned concepts are forgotten at the expense of more
recent samples. In this work, we challenge the assumption that continual
learning is inevitably associated with catastrophic forgetting by presenting a
set of tasks that surprisingly do not suffer from catastrophic forgetting when
learned continually. We provide evidence that these reconstruction-type tasks
exhibit positive forward transfer and that single-view 3D shape reconstruction
improves the performance on learned and novel categories over time. We provide
the novel analysis of knowledge transfer ability by looking at the output
distribution shift across sequential learning tasks. Finally, we show that the
robustness of these tasks leads to the potential of having a proxy
representation learning task for continual classification. The codebase,
dataset, and pre-trained models released with this article can be found at
https://github.com/rehg-lab/CLRec
Watch and Learn: Semi-Supervised Learning of Object Detectors from Videos
We present a semi-supervised approach that localizes multiple unknown object
instances in long videos. We start with a handful of labeled boxes and
iteratively learn and label hundreds of thousands of object instances. We
propose criteria for reliable object detection and tracking for constraining
the semi-supervised learning process and minimizing semantic drift. Our
approach does not assume exhaustive labeling of each object instance in any
single frame, or any explicit annotation of negative data. Working in such a
generic setting allow us to tackle multiple object instances in video, many of
which are static. In contrast, existing approaches either do not consider
multiple object instances per video, or rely heavily on the motion of the
objects present. The experiments demonstrate the effectiveness of our approach
by evaluating the automatically labeled data on a variety of metrics like
quality, coverage (recall), diversity, and relevance to training an object
detector.Comment: To appear in CVPR 201
- …