5 research outputs found
OpenSceneVLAD: Appearance Invariant, Open Set Scene Classification
Scene classification is a well-established area of computer vision research that aims to classify a scene image into pre-defined categories such as playground, beach and airport. Recent work has focused on increasing the variety of pre-defined categories for classification, but so far failed to consider two major challenges: changes in scene appearance due to lighting and open set classification (the ability to classify unknown scene data as not belonging to the trained classes). Our first contribution, SceneVLAD, fuses scene classification and visual place recognition CNNs for appearance invariant scene classification that outperforms state-of-the-art scene classification by a mean F1 score of up to 0.1. Our second contribution, OpenSceneVLAD, extends the first to an open set classification scenario using intra-class splitting to achieve a mean increase in F1 scores of up to 0.06 compared to using state-of-the-art openmax layer. We achieve these results on three scene class datasets extracted from large scale outdoor visual localisation datasets, one of which we collected ourselves.</p
Class Anchor Clustering: a Loss for Distance-based Open Set Recognition
In open set recognition, deep neural networks encounter object classes that
were unknown during training. Existing open set classifiers distinguish between
known and unknown classes by measuring distance in a network's logit space,
assuming that known classes cluster closer to the training data than unknown
classes. However, this approach is applied post-hoc to networks trained with
cross-entropy loss, which does not guarantee this clustering behaviour. To
overcome this limitation, we introduce the Class Anchor Clustering (CAC) loss.
CAC is a distance-based loss that explicitly trains known classes to form tight
clusters around anchored class-dependent centres in the logit space. We show
that training with CAC achieves state-of-the-art performance for distance-based
open set classifiers on all six standard benchmark datasets, with a 15.2% AUROC
increase on the challenging TinyImageNet, without sacrificing classification
accuracy. We also show that our anchored class centres achieve higher open set
performance than learnt class centres, particularly on object-based datasets
and large numbers of training classes.Comment: Published at 2021 IEEE Winter Conference on Applications of Computer
Vision (WACV
Task-Adaptive Negative Class Envision for Few-Shot Open-Set Recognition
Recent works seek to endow recognition systems with the ability to handle the
open world. Few shot learning aims for fast learning of new classes from
limited examples, while open-set recognition considers unknown negative class
from the open world. In this paper, we study the problem of few-shot open-set
recognition (FSOR), which learns a recognition system robust to queries from
new sources with few examples and from unknown open sources. To achieve that,
we mimic human capability of envisioning new concepts from prior knowledge, and
propose a novel task-adaptive negative class envision method (TANE) to model
the open world. Essentially we use an external memory to estimate a negative
class representation. Moreover, we introduce a novel conjugate episode training
strategy that strengthens the learning process. Extensive experiments on four
public benchmarks show that our approach significantly improves the
state-of-the-art performance on few-shot open-set recognition. Besides, we
extend our method to generalized few-shot open-set recognition (GFSOR), where
we also achieve performance gains on MiniImageNet
Recommended from our members
A General Framework for Model Adaptation to Meet Practical Constraints in Computer Vision
Recent advances in deep learning models have shown impressive capabilities in various computer vision tasks, which encourages the integration of these models into real-world vision systems such as smart devices. This integration presents new challenges as models need to meet complex real-world requirements. This thesis is dedicated to building practical deep learning models, where we focus on two main challenges in vision systems: data efficiency and variability. We address these issues by providing a general model adaptation framework that extends models with practical capabilities.
In the first part of the thesis, we explore model adaptation approaches for efficient representation. We illustrate the benefits of different types of efficient data representations, including compressed video modalities from video codecs, low-bit features and sparsified frames and texts. By using such efficient representation, the system complexity such as data storage, processing and computation can be greatly reduced. We systematically study various methods to extract, learn and utilize these representations, presenting new methods to adapt machine learning models for them. The proposed methods include a compressed-domain video recognition model with coarse-to-fine distillation training strategy, a task-specific feature compression framework for low-bit video-and-language understanding, and a learnable token sparsification approach for sparsifying human-interpretable video inputs. We demonstrate new perspectives of representing vision data in a more practical and efficient way in various applications.
The second part of the thesis focuses on open environment challenges, where we explore model adaptation for new, unseen classes and domains. We examine the practical limitations in current recognition models, and introduce various methods to empower models in addressing open recognition scenarios. This includes a negative envisioning framework for managing new classes and outliers, and a multi-domain translation approach for dealing with unseen domain data. Our study shows a promising trajectory towards models exhibiting the capability to navigate through diverse data environments in real-world applications