31,783 research outputs found
Knowledge Transfer in Object Recognition.
PhD Thesis.Abstract
Object recognition is a fundamental and long-standing problem in computer vision. Since
the latest resurgence of deep learning, thousands of techniques have been proposed and brought
to commercial products to facilitate people’s daily life. Although remarkable achievements in
object recognition have been witnessed, existing machine learning approaches remain far away
from human vision system, especially in learning new concepts and Knowledge Transfer (KT)
across scenarios. One main reason is that current learning approaches address isolated tasks
by independently training predefined models, without considering any knowledge learned from
previous tasks or models. In contrast, humans have an inherent ability to transfer the knowledge
acquired from earlier tasks or people to new scenarios. Therefore, to scaling object recognition
in realistic deployment, effective KT schemes are required.
This thesis studies several aspects of KT for scaling object recognition systems. Specifically,
to facilitate the KT process, several mechanisms on fine-grained and coarse-grained object recognition
tasks are analyzed and studied, including 1) cross-class KT on person re-identification (reid);
2) cross-domain KT on person re-identification; 3) cross-model KT on image classification;
4) cross-task KT on image classification. In summary, four types of knowledge transfer schemes
are discussed as follows:
Chapter 3 Cross-class KT in person re-identification, one of representative fine-grained object
recognition tasks, is firstly investigated. The nature of person identity classes for person
re-id are totally disjoint between training and testing (a zero-shot learning problem), resulting
in the highly demand of cross-class KT. To solve that, existing person re-id approaches aim
to derive a feature representation for pairwise similarity based matching and ranking, which is
able to generalise to test. However, current person re-id methods assume the provision of accurately
cropped person bounding boxes and each of them is in the same resolution, ignoring the
impact of the background noise and variant scale of images to cross-class KT. This is more severed
in practice when person bounding boxes must be detected automatically given a very large
number of images and/or videos (un-constrained scene images) processed. To address these challenges,
this chapter provides two novel approaches, aiming to promote cross-class KT and boost
re-id performance. 1) This chapter alleviates inaccurate person bounding box by developing a
joint learning deep model that optimises person re-id attention selection within any auto-detected
person bounding boxes by reinforcement learning of background clutter minimisation. Specifically,
this chapter formulates a novel unified re-id architecture called Identity DiscriminativE
Attention reinforcement Learning (IDEAL) to accurately select re-id attention in auto-detected
bounding boxes for optimising re-id performance. 2) This chapter addresses multi-scale problem
by proposing a Cross-Level Semantic Alignment (CLSA) deep learning approach capable of
learning more discriminative identity feature representations in a unified end-to-end model. This
4
is realised by exploiting the in-network feature pyramid structure of a deep neural network enhanced
by a novel cross pyramid-level semantic alignment loss function. Extensive experiments
show the modelling advantages and performance superiority of both IDEAL and CLSA over the
state-of-the-art re-id methods on widely used benchmarking datasets.
Chapter 4 In this chapter, we address the problem of cross-domain KT in unsupervised
domain adaptation for person re-id. Specifically, this chapter considers cross-domain KT as
follows: 1) Unsupervised domain adaptation: “train once, run once” pattern, transferring knowledge
from source domain to specific target domain and the model is restricted to be applied
on target domain only; 2) Universal re-id: “train once, run everywhere” pattern, transferring
knowledge from source domain to any target domains, and therefore is capable of deploying any
domains of re-id task. This chapter firstly develops a novel Hierarchical Unsupervised Domain
Adaptation (HUDA) method for unsupervised domain adaptation for re-id. It can automatically
transfer labelled information of an existing dataset (a source domain) to an unlabelled target
domain for unsupervised person re-id. Specifically, HUDA is designed to model jointly global
distribution alignment and local instance alignment in a two-level hierarchy for discovering transferable
source knowledge in unsupervised domain adaptation. Crucially, this approach aims to
overcome the under-constrained learning problem of existing unsupervised domain adaptation
methods, lacking of the local instance alignment constraint. The consequence is more effective
and cross-domain KT from the labelled source domain to the unlabelled target domain. This
chapter further addresses the limitation of “train once, run once ” for existing domain adaptation
person re-id approaches by presenting a novel “train once, run everywhere” pattern. This
conventional “train once, run once” pattern is unscalable to a large number of target domains
typically encountered in real-world deployments, due to the requirement of training a separate
model for each target domain as supervised learning methods. To mitigate this weakness, a novel
“Universal Model Learning” (UML) approach is formulated to enable domain-generic person
re-id using only limited training data of a “single” seed domain. Specifically, UML trains a universal
re-id model to discriminate between a set of transformed person identity classes. Each of
such classes is formed by applying a variety of random appearance transformations to the images
of that class, where the transformations simulate camera viewing conditions of any domains for
making the model domain generic.
Chapter 5 The third problem considered in this thesis is cross-model KT in coarse-grained
object recognition. This chapter discusses knowledge distillation in image classification. Knowledge
distillation is an effective approach to transfer knowledge from a large teacher neural network
to a small student (target) network for satisfying the low-memory and fast running requirements.
Whilst being able to create stronger target networks compared to the vanilla non-teacher
based learning strategy, this scheme needs to train additionally a large teacher model with expensive
computational cost and requires complex multi-stages training. This chapter firstly presents
a Self-Referenced Deep Learning (SRDL) strategy to accelerate the training process. Unlike
both vanilla optimisation and knowledge distillation, SRDL distils the knowledge discovered
by the in-training target model back to itself for regularising the subsequent learning procedure
therefore eliminating the need for training a large teacher model. Secondly, an On-the-fly Native
Ensemble (ONE) learning strategy for one-stage knowledge distillation is proposed to solve the
weakness of complex multi-stages training. Specifically, ONE only trains a single multi-branch
network while simultaneously establishing a strong teacher on-the-fly to enhance the learning of
target network.
Chapter 6 Forth, this thesis studies the cross-task KT in coarse-grained object recognition.
This chapter focuses on the few-shot classification problem, which aims to train models capable
of recognising new, previously unseen categories from the novel task by using only limited training
samples. Existing metric learning approaches constitute a highly popular strategy, learning
discriminative representations such that images, containing different classes, are well separated
in an embedding space. The commonly held assumption that each class is summarised by a sin5
gle, global representation (referred to as a prototype) that is then used as a reference to infer class
labels brings significant drawbacks. This formulation fails to capture the complex multi-modal
latent distributions that often exist in real-world problems, and yields models that are highly
sensitive to the prototype quality. To address these limitations, this chapter proposes a novel
Mixture of Prototypes (MP) approach that learns multi-modal class representations, and can be
integrated into existing metric based methods. MP models class prototypes as a group of feature
representations carefully designed to be highly diverse and maximise ensembling performance.
Furthermore, this thesis investigates the benefit of incorporating unlabelled data in cross-task
KT, and focuses on the problem of Semi-Supervised Few-shot Learning (SS-FSL). Recent SSFSL
work has relied on popular Semi-Supervised Learning (SSL) concepts, involving iterative
pseudo-labelling, yet often yields models that are susceptible to error propagation and sensitive
to initialisation. To address this limitation, this chapter introduces a novel prototype-based approach
(Fewmatch) for SS-FSL that exploits model Consistency Regularization (CR) in a robust
manner and promotes cross-task unlabelled data knowledge transfer. Fewmatch exploits unlabelled
data via Dynamic Prototype Refinement (DPR) approach, where novel class prototypes
are alternatively refined 1) explicitly, using unlabelled data with high confidence class predictions
and 2) implicitly, by model fine-tuning using a data selective model CR loss. DPR affords
CR convergence, with the explicit refinement providing an increasingly stronger initialisation
and alleviates the issue of error propagation, due to the application of CR.
Chapter 7 draws conclusions and suggests future works that extend the ideas and methods
developed in this thesi
Few-shot classification in Named Entity Recognition Task
For many natural language processing (NLP) tasks the amount of annotated data
is limited. This urges a need to apply semi-supervised learning techniques,
such as transfer learning or meta-learning. In this work we tackle Named Entity
Recognition (NER) task using Prototypical Network - a metric learning
technique. It learns intermediate representations of words which cluster well
into named entity classes. This property of the model allows classifying words
with extremely limited number of training examples, and can potentially be used
as a zero-shot learning method. By coupling this technique with transfer
learning we achieve well-performing classifiers trained on only 20 instances of
a target class.Comment: In proceedings of the 34th ACM/SIGAPP Symposium on Applied Computin
Information Extraction in Illicit Domains
Extracting useful entities and attribute values from illicit domains such as
human trafficking is a challenging problem with the potential for widespread
social impact. Such domains employ atypical language models, have `long tails'
and suffer from the problem of concept drift. In this paper, we propose a
lightweight, feature-agnostic Information Extraction (IE) paradigm specifically
designed for such domains. Our approach uses raw, unlabeled text from an
initial corpus, and a few (12-120) seed annotations per domain-specific
attribute, to learn robust IE models for unobserved pages and websites.
Empirically, we demonstrate that our approach can outperform feature-centric
Conditional Random Field baselines by over 18\% F-Measure on five annotated
sets of real-world human trafficking datasets in both low-supervision and
high-supervision settings. We also show that our approach is demonstrably
robust to concept drift, and can be efficiently bootstrapped even in a serial
computing environment.Comment: 10 pages, ACM WWW 201
Recent Advances in Transfer Learning for Cross-Dataset Visual Recognition: A Problem-Oriented Perspective
This paper takes a problem-oriented perspective and presents a comprehensive
review of transfer learning methods, both shallow and deep, for cross-dataset
visual recognition. Specifically, it categorises the cross-dataset recognition
into seventeen problems based on a set of carefully chosen data and label
attributes. Such a problem-oriented taxonomy has allowed us to examine how
different transfer learning approaches tackle each problem and how well each
problem has been researched to date. The comprehensive problem-oriented review
of the advances in transfer learning with respect to the problem has not only
revealed the challenges in transfer learning for visual recognition, but also
the problems (e.g. eight of the seventeen problems) that have been scarcely
studied. This survey not only presents an up-to-date technical review for
researchers, but also a systematic approach and a reference for a machine
learning practitioner to categorise a real problem and to look up for a
possible solution accordingly
- …