10 research outputs found
Are Labels Needed for Incremental Instance Learning?
In this paper, we learn to classify visual object instances, incrementally
and via self-supervision (self-incremental). Our learner observes a single
instance at a time, which is then discarded from the dataset. Incremental
instance learning is challenging, since longer learning sessions exacerbate
forgetfulness, and labeling instances is cumbersome. We overcome these
challenges via three contributions: i. We propose VINIL, a self-incremental
learner that can learn object instances sequentially, ii. We equip VINIL with
self-supervision to by-pass the need for instance labelling, iii. We compare
VINIL to label-supervised variants on two large-scale benchmarks, and show that
VINIL significantly improves accuracy while reducing forgetfulness.Comment: Accepted at CVPRW on CLVISION (Oral
What Can AutoML Do For Continual Learning?
This position paper outlines the potential of AutoML for incremental
(continual) learning to encourage more research in this direction. Incremental
learning involves incorporating new data from a stream of tasks and
distributions to learn enhanced deep representations and adapt better to new
tasks. However, a significant limitation of incremental learners is that most
current techniques freeze the backbone architecture, hyperparameters, and the
order & structure of the learning tasks throughout the learning and adaptation
process. We strongly believe that AutoML offers promising solutions to address
these limitations, enabling incremental learning to adapt to more diverse
real-world tasks. Therefore, instead of directly proposing a new method, this
paper takes a step back by posing the question: "What can AutoML do for
incremental learning?" We outline three key areas of research that can
contribute to making incremental learners more dynamic, highlighting concrete
opportunities to apply AutoML methods in novel ways as well as entirely new
challenges for AutoML research
Diagnosing Rarity in Human-Object Interaction Detection
Human-object interaction (HOI) detection is a core task in computer vision.
The goal is to localize all human-object pairs and recognize their
interactions. An interaction defined by a tuple leads to a
long-tailed visual recognition challenge since many combinations are rarely
represented. The performance of the proposed models is limited especially for
the tail categories, but little has been done to understand the reason. To that
end, in this paper, we propose to diagnose rarity in HOI detection. We propose
a three-step strategy, namely Detection, Identification and Recognition where
we carefully analyse the limiting factors by studying state-of-the-art models.
Our findings indicate that detection and identification steps are altered by
the interaction signals like occlusion and relative location, as a result
limiting the recognition accuracy.Comment: Accepted at CVPR'20 Workshop on Learning from Limited Label
Continual Learning of Object Instances
We propose continual instance learning - a method that applies the concept of
continual learning to the task of distinguishing instances of the same object
category. We specifically focus on the car object, and incrementally learn to
distinguish car instances from each other with metric learning. We begin our
paper by evaluating current techniques. Establishing that catastrophic
forgetting is evident in existing methods, we then propose two remedies.
Firstly, we regularise metric learning via Normalised Cross-Entropy. Secondly,
we augment existing models with synthetic data transfer. Our extensive
experiments on three large-scale datasets, using two different architectures
for five different continual learning methods, reveal that Normalised
cross-entropy and synthetic transfer leads to less forgetting in existing
techniques.Comment: Accepted to CVPR 2020: Workshop on Continual Learning in Computer
Visio
Locality-Aware Hyperspectral Classification
Hyperspectral image classification is gaining popularity for high-precision
vision tasks in remote sensing, thanks to their ability to capture visual
information available in a wide continuum of spectra. Researchers have been
working on automating Hyperspectral image classification, with recent efforts
leveraging Vision-Transformers. However, most research models only spectra
information and lacks attention to the locality (i.e., neighboring pixels),
which may be not sufficiently discriminative, resulting in performance
limitations. To address this, we present three contributions: i) We introduce
the Hyperspectral Locality-aware Image TransformEr (HyLITE), a vision
transformer that models both local and spectral information, ii) A novel
regularization function that promotes the integration of local-to-global
information, and iii) Our proposed approach outperforms competing baselines by
a significant margin, achieving up to 10% gains in accuracy. The trained models
and the code are available at HyLITE.Comment: The paper is accepted at BMVC202
Adaptive Regularization for Class-Incremental Learning
Class-Incremental Learning updates a deep classifier with new categories
while maintaining the previously observed class accuracy. Regularizing the
neural network weights is a common method to prevent forgetting previously
learned classes while learning novel ones. However, existing regularizers use a
constant magnitude throughout the learning sessions, which may not reflect the
varying levels of difficulty of the tasks encountered during incremental
learning. This study investigates the necessity of adaptive regularization in
Class-Incremental Learning, which dynamically adjusts the regularization
strength according to the complexity of the task at hand. We propose a Bayesian
Optimization-based approach to automatically determine the optimal
regularization magnitude for each learning task. Our experiments on two
datasets via two regularizers demonstrate the importance of adaptive
regularization for achieving accurate and less forgetful visual incremental
learning
DATA-DRIVEN IMAGE CAPTIONING WITH META-CLASS BASED RETRIEVAL
Automatic image captioning, the process cif producing a description for an image, is a very challenging problem which has only recently received interest from the computer vision and natural language processing communities. In this study, we present a novel data-driven image captioning strategy, which, for a given image, finds the most visually similar image in a large dataset of image-caption pairs and transfers its caption as the description of the input image. Our novelty lies in employing a recently' proposed high-level global image representation, named the meta-class descriptor, to better capture the semantic content of the input image for use in the retrieval process. Our experiments show that as compared to the baseline Im2Text model, our meta-class guided approach produces more accurate descriptions
Data-driven image captioning with meta-class based retrieval
Automatic image captioning, the process cif producing a description for an image, is a very challenging problem which has only recently received interest from the computer vision and natural language processing communities. In this study, we present a novel data-driven image captioning strategy, which, for a given image, finds the most visually similar image in a large dataset of image-caption pairs and transfers its caption as the description of the input image. Our novelty lies in employing a recently' proposed high-level global image representation, named the meta-class descriptor, to better capture the semantic content of the input image for use in the retrieval process. Our experiments show that as compared to the baseline Im2Text model, our meta-class guided approach produces more accurate descriptions
Data-driven image captioning via salient region discovery
n the past few years, automatically generating descriptions for images has attracted a lot of attention in computer vision and natural language processing research. Among the existing approaches, data-driven methods have been proven to be highly effective. These methods compare the given image against a large set of training images to determine a set of relevant images, then generate a description using the associated captions. In this study, the authors propose to integrate an object-based semantic image representation into a deep features-based retrieval framework to select the relevant images. Moreover, they present a novel phrase selection paradigm and a sentence generation model which depends on a joint analysis of salient regions in the input and retrieved images within a clustering framework. The authors demonstrate the effectiveness of their proposed approach on Flickr8K and Flickr30K benchmark datasets and show that their model gives highly competitive results compared with the state-of-the-art models
Data‐driven image captioning via salient region discovery
n the past few years, automatically generating descriptions for images has attracted a lot of attention in computer vision and natural language processing research. Among the existing approaches, data-driven methods have been proven to be highly effective. These methods compare the given image against a large set of training images to determine a set of relevant images, then generate a description using the associated captions. In this study, the authors propose to integrate an object-based semantic image representation into a deep features-based retrieval framework to select the relevant images. Moreover, they present a novel phrase selection paradigm and a sentence generation model which depends on a joint analysis of salient regions in the input and retrieved images within a clustering framework. The authors demonstrate the effectiveness of their proposed approach on Flickr8K and Flickr30K benchmark datasets and show that their model gives highly competitive results compared with the state-of-the-art models