152 research outputs found
Exploiting Cross Domain Relationships for Target Recognition
Cross domain recognition extracts knowledge from one domain to recognize samples from another domain of interest. The key to solving problems under this umbrella is to find out the latent connections between different domains. In this dissertation, three different cross domain recognition problems are studied by exploiting the relationships between different domains explicitly according to the specific real problems.
First, the problem of cross view action recognition is studied. The same action might seem quite different when observed from different viewpoints. Thus, how to use the training samples from a given camera view and perform recognition in another new view is the key point. In this work, reconstructable paths between different views are built to mirror labeled actions from one source view into one another target view for learning an adaptable classifier. The path learning takes advantage of the joint dictionary learning techniques with exploiting hidden information in the seemingly useless samples, making the recognition performance robust and effective.
Second, the problem of person re-identification is studied, which tries to match pedestrian images in non-overlapping camera views based on appearance features. In this work, we propose to learn a random kernel forest to discriminatively assign a specific distance metric to each pair of local patches from the two images in matching. The forest is composed by multiple decision trees, which are designed to partition the overall space of local patch-pairs into substantial subspaces, where a simple but effective local metric kernel can be defined to minimize the distance of true matches.
Third, the problem of multi-event detection and recognition in smart grid is studied. The signal of multi-event might not be a straightforward combination of some single-event signals because of the correlation among devices. In this work, a concept of ``root-pattern\u27\u27 is proposed that can be extracted from a collection of single-event signals, but also transferable to analyse the constituent components of multi-cascading-event signals based on an over-complete dictionary, which is designed according to the ``root-patterns\u27\u27 with temporal information subtly embedded.
The correctness and effectiveness of the proposed approaches have been evaluated by extensive experiments
PAC-GAN:An Effective Pose Augmentation Scheme for Unsupervised Cross-View Person Re-identification
Person re-identification (person Re-Id) aims to retrieve the pedestrian images of the same person that captured by disjoint and non-overlapping cameras. Lots of researchers recently focused on this hot issue and proposed deep learning based methods to enhance the recognition rate in a supervised or unsupervised manner. However,there are two limitations that cannot be ignored: firstly, compared with other image retrieval benchmarks, the size of existing person Re-Id datasets is far from meeting the requirement, which cannot provide sufficient pedestrian samples for the training of deep model; secondly, the samples in existing datasets do not have sufficient human motions or postures coverage to provide more priori knowledges for learning. In this paper, we introduce a novel unsupervised pose augmentation cross-view person Re-Id scheme called PAC-GAN to overcome these limitations. We firstly present the formal definition of cross-view pose augmentation and then propose the framework of PAC-GAN that is a novel conditional generative adversarial network (CGAN) based approach to improve the performance of unsupervised corss-view person Re-Id. Specifically, the pose generation model in PAC-GAN called CPG-Net is to generate enough quantity of pose-rich samples from original image and skeleton samples. The pose augmentation dataset is produced by combining the synthesized pose-rich samples with the original samples, which is fed into the corss-view person Re-Id model named Cross-GAN. Besides, we use weight-sharing strategy in the CPG-Net to improve the quality of new generated samples. To the best of our knowledge, we are the first to enhance the unsupervised cross-view person Re-Id by pose augmentation, and the results of extensive experiments show that the proposed scheme can combat the state-of-the-arts with recognition rate
Learning Discriminative Features for Person Re-Identification
For fulfilling the requirements of public safety in modern cities, more and more large-scale surveillance camera systems are deployed, resulting in an enormous amount of visual data. Automatically processing and interpreting these data promote the development and application of visual data analytic technologies. As one of the important research topics in surveillance systems, person re-identification (re-id) aims at retrieving the target person across non-overlapping camera-views that are implemented in a number of distributed space-time locations. It is a fundamental problem for many practical surveillance applications, eg, person search, cross-camera tracking, multi-camera human behavior analysis and prediction, and it received considerable attentions nowadays from both academic and industrial domains.
Learning discriminative feature representation is an essential task in person re-id. Although many methodologies have been proposed, discriminative re-id feature extraction is still a challenging problem due to: (1) Intra- and inter-personal variations. The intrinsic properties of the camera deployment in surveillance system lead to various changes in person poses, view-points, illumination conditions etc. This may result in the large intra-personal variations and/or small inter-personal variations, thus incurring problems in matching person images. (2) Domain variations. The domain variations between different datasets give rise to the problem of generalization capability of re-id model. Directly applying a re-id model trained on one dataset to another one usually causes a large performance degradation. (3) Difficulties in data creation and annotation. Existing person re-id methods, especially deep re-id methods, rely mostly on a large set of inter-camera identity labelled training data, requiring a tedious data collection and annotation process. This leads to poor scalability in practical person re-id applications.
Corresponding to the challenges in learning discriminative re-id features, this thesis contributes to the re-id domain by proposing three related methodologies and one new re-id setting:
(1) Gaussian mixture importance estimation. Handcrafted features are usually not discriminative enough for person re-id because of noisy information, such as background clutters. To precisely evaluate the similarities between person images, the main task of distance metric learning is to filter out the noisy information. Keep It Simple and Straightforward MEtric (KISSME) is an effective method in person re-id. However, it is sensitive to the feature dimensionality and cannot capture the multi-modes in dataset. To this end, a Gaussian Mixture Importance Estimation re-id approach is proposed, which exploits the Gaussian Mixture Models for estimating the observed commonalities of similar and dissimilar person pairs in the feature space.
(2) Unsupervised domain-adaptive person re-id based on pedestrian attributes. In person re-id, person identities are usually not overlapped among different domains (or datasets) and this raises the difficulties in generalizing re-id models. Different from person identity, pedestrian attributes, eg., hair length, clothes type and color, are consistent across different domains (or datasets). However, most of re-id datasets lack attribute annotations. On the other hand, in the field of pedestrian attribute recognition, there is a number of datasets labeled with attributes. Exploiting such data for re-id purpose can alleviate the shortage of attribute annotations in re-id domain and improve the generalization capability of re-id model. To this end, an unsupervised domain-adaptive re-id feature learning framework is proposed to make full use of attribute annotations. Specifically, an existing unsupervised domain adaptation method has been extended to transfer attribute-based features from attribute recognition domain to the re-id domain. With the proposed re-id feature learning framework, the domain invariant feature representations can be effectively extracted.
(3) Intra-camera supervised person re-id. Annotating the large-scale re-id datasets requires a tedious data collection and annotation process and therefore leads to poor scalability in practical person re-id applications. To overcome this fundamental limitation, a new person re-id setting is considered without inter-camera identity association but only with identity labels independently annotated within each camera-view. This eliminates the most time-consuming and tedious inter-camera identity association annotating process and thus significantly reduces the amount of human efforts required during annotation. It hence gives rise to a more scalable and more feasible learning scenario, which is named as Intra-Camera Supervised (ICS) person re-id. Under this ICS setting, a new re-id method, i.e., Multi-task Mulit-label (MATE) learning method, is formulated. Given no inter-camera association,
MATE is specially designed for self-discovering the inter-camera identity correspondence. This is achieved by inter-camera multi-label learning under a joint multi-task inference framework. In addition, MATE can also efficiently learn the discriminative re-id feature representations using the available identity labels within each camera-view
Self-supervised Metric Learning
H Μάθηση Μετρικής είναι ένα σημαντικό παράδειγμα για μία πληθώρα προβλημάτων της
Μηχανικής Μάθησης και της Όρασης Υπολογιστών. Έχει επιτυχημένα εφαρμοστεί σε ε-
φαρμογές όπως η λεπτομερής ταξινόμηση, ανάκτηση πληροφορίας, αναγνώριση προσώ-
που κ.α. Αφορά την εκμάθηση μιας μετρικής απόστασης που βασίζεται στον προσδιορι-
σμό ομοιοτήτων ή ανομοιοτήτων μεταξύ των δειγμάτων. Στόχος της είναι να μειωθεί η
απόσταση μεταξύ παρόμοιων δειγμάτων και ταυτόχρονα να αυξηθεί η απόσταση μεταξύ
ανόμοιων. Ως εκ τούτου, είναι σημαντικό η μάθηση μετρικής να είναι εκπαιδευόμενη ώστε
να προσαρμόζεται σε δεδομένα από διαφορετικούς τομείς.
Η εκπαίδευση ενός Συνελικτικού Νευρωνικού Δικτύου ώστε να διακρίνει παρόμοιες από
ανόμοιες εικόνες απαιτεί κάποιου είδους επίβλεψη. Στην εποχή του μεγάλου όγκου δεδο-
μένων, λόγω του περιορισμένου αριθμού των ανθρωπίνως επισημειωμένων δεδομένων,
οι μέθοδοι βαθιάς μάθησης προσαρμόστηκαν να λειτουργούν χωρίς επίβλεψη.
Οι αυτοεπιβλεπόμενες μέθοδοι μπορούν να θεωρηθούν ως μια ειδική μορφή μεθόδων
μάθησης χωρίς επίβλεψη με εποπτευόμενη μορφή, όπου η εποπτεία πηγάζει από αυτοε-
ποπτευόμενες εργασίες και όχι από προκαθορισμένη προηγούμενη γνώση. Σε αντίθεση
με μια εντελώς μη επιβλεπόμενη διεργασία, η αυτοεπιβλεπόμενη μάθηση χρησιμοποιεί
πληροφορίες από το ίδιο το σύνολο δεδομένων για να δημιουργήσει ψευδο-ετικέτες.
Στην παρούσα εργασία εξετάζουμε ορισμένες αυτοεπιβλεπόμενες μεθόδους μετρικής εκ-
μάθησης που χρησιμοποιούν διαφορετικές τεχνικές εξόρυξης δειγμάτων καθώς και συ-
ναρτήσεις κόστους με σκοπό τη διερεύνηση της αποτελεσματικότητάς τους τόσο στη χρή-
ση προεκπαιδευμένου δικτύου στο ImageNet όσο και στην χρήση τυχαία αρχικοποιημέ-
νου δικτύου. Η αξιολόγηση των μεθόδων πραγματοποιείται στα πιο διαδεδομένα σύνολα
δεδομένων ανάκτησης πληροφορίας και μάθησης μετρικής. Παρατηρείται πως οι ήπιες
συναρτήσεις κόστους εκμεταλλεύονται τις ομοιότητες μεταξύ των δειγμάτων λαμβάνοντας
υπόψιν τους γείτονές τους, έχουν καλύτερα αποτελέσματα σε σχέση με τις απόλυτες συ-
ναρτήσεις κόστους που χρησιμοποιούν τις ομοιότητες κατά ζεύγη. Επιπλέον, φαίνεται
πως η τεχνητή επάυξηση των αρχικών εικόνων του συνόλου δεδομένων για την δημιουρ-
γία θετικών ζευγών μπορεί να βοηθήσει την αυτοεπιβλεπόμενη μάθηση και ιδιαίτερα στο
ξεκίνημά της.Metric learning is an important paradigm for a variety of problems in machine learning
and computer vision. It has been successfully employed for fine-grained classification, retrieval,
face recognition, person re-identification and few-shot learning, among other tasks.
Metric learning is an approach based on a distance metric that aims to determine similarities
or dissimilarities between samples. The goal is to reduce the distance between similar
samples and at the same time to increase the distance of dissimilar ones. Therefore, it is
crucial that the distance measure is learnable to adapt to data from different domains.
Training a Convolutional Neural Network to distinguish similar from dissimilar images requires
some kind of supervision. In the era of big data, due to limited human-powered
annotated data, deep learning methods are recently adapted to work without supervision.
Self-supervised methods can be considered as a special form of unsupervised
learning methods with a supervised form, where supervision is induced by self-supervised
tasks rather than predetermined prior knowledge. Unlike a completely unsupervised setting,
self-supervised learning uses information from the dataset itself to generate pseudolabels.
In this work we consider some self-supervised metric learning methods which use different
sample mining techniques as well as loss functions to investigate its effectiveness in both
using pre-trained network on ImageNet and initialized from scratch. The evaluation is performed
on four benchmark metric learning and retrieval datasets. It appears that soft loss
functions that exploit contextual similarities between samples outperform hard ones that
use pairwise similarities. Furthermore, it seems that augmented versions of the original
images can be used as positive pairs to initiate the self-supervised training process
Knowledge Transfer in Object Recognition.
PhD Thesis.Abstract
Object recognition is a fundamental and long-standing problem in computer vision. Since
the latest resurgence of deep learning, thousands of techniques have been proposed and brought
to commercial products to facilitate people’s daily life. Although remarkable achievements in
object recognition have been witnessed, existing machine learning approaches remain far away
from human vision system, especially in learning new concepts and Knowledge Transfer (KT)
across scenarios. One main reason is that current learning approaches address isolated tasks
by independently training predefined models, without considering any knowledge learned from
previous tasks or models. In contrast, humans have an inherent ability to transfer the knowledge
acquired from earlier tasks or people to new scenarios. Therefore, to scaling object recognition
in realistic deployment, effective KT schemes are required.
This thesis studies several aspects of KT for scaling object recognition systems. Specifically,
to facilitate the KT process, several mechanisms on fine-grained and coarse-grained object recognition
tasks are analyzed and studied, including 1) cross-class KT on person re-identification (reid);
2) cross-domain KT on person re-identification; 3) cross-model KT on image classification;
4) cross-task KT on image classification. In summary, four types of knowledge transfer schemes
are discussed as follows:
Chapter 3 Cross-class KT in person re-identification, one of representative fine-grained object
recognition tasks, is firstly investigated. The nature of person identity classes for person
re-id are totally disjoint between training and testing (a zero-shot learning problem), resulting
in the highly demand of cross-class KT. To solve that, existing person re-id approaches aim
to derive a feature representation for pairwise similarity based matching and ranking, which is
able to generalise to test. However, current person re-id methods assume the provision of accurately
cropped person bounding boxes and each of them is in the same resolution, ignoring the
impact of the background noise and variant scale of images to cross-class KT. This is more severed
in practice when person bounding boxes must be detected automatically given a very large
number of images and/or videos (un-constrained scene images) processed. To address these challenges,
this chapter provides two novel approaches, aiming to promote cross-class KT and boost
re-id performance. 1) This chapter alleviates inaccurate person bounding box by developing a
joint learning deep model that optimises person re-id attention selection within any auto-detected
person bounding boxes by reinforcement learning of background clutter minimisation. Specifically,
this chapter formulates a novel unified re-id architecture called Identity DiscriminativE
Attention reinforcement Learning (IDEAL) to accurately select re-id attention in auto-detected
bounding boxes for optimising re-id performance. 2) This chapter addresses multi-scale problem
by proposing a Cross-Level Semantic Alignment (CLSA) deep learning approach capable of
learning more discriminative identity feature representations in a unified end-to-end model. This
4
is realised by exploiting the in-network feature pyramid structure of a deep neural network enhanced
by a novel cross pyramid-level semantic alignment loss function. Extensive experiments
show the modelling advantages and performance superiority of both IDEAL and CLSA over the
state-of-the-art re-id methods on widely used benchmarking datasets.
Chapter 4 In this chapter, we address the problem of cross-domain KT in unsupervised
domain adaptation for person re-id. Specifically, this chapter considers cross-domain KT as
follows: 1) Unsupervised domain adaptation: “train once, run once” pattern, transferring knowledge
from source domain to specific target domain and the model is restricted to be applied
on target domain only; 2) Universal re-id: “train once, run everywhere” pattern, transferring
knowledge from source domain to any target domains, and therefore is capable of deploying any
domains of re-id task. This chapter firstly develops a novel Hierarchical Unsupervised Domain
Adaptation (HUDA) method for unsupervised domain adaptation for re-id. It can automatically
transfer labelled information of an existing dataset (a source domain) to an unlabelled target
domain for unsupervised person re-id. Specifically, HUDA is designed to model jointly global
distribution alignment and local instance alignment in a two-level hierarchy for discovering transferable
source knowledge in unsupervised domain adaptation. Crucially, this approach aims to
overcome the under-constrained learning problem of existing unsupervised domain adaptation
methods, lacking of the local instance alignment constraint. The consequence is more effective
and cross-domain KT from the labelled source domain to the unlabelled target domain. This
chapter further addresses the limitation of “train once, run once ” for existing domain adaptation
person re-id approaches by presenting a novel “train once, run everywhere” pattern. This
conventional “train once, run once” pattern is unscalable to a large number of target domains
typically encountered in real-world deployments, due to the requirement of training a separate
model for each target domain as supervised learning methods. To mitigate this weakness, a novel
“Universal Model Learning” (UML) approach is formulated to enable domain-generic person
re-id using only limited training data of a “single” seed domain. Specifically, UML trains a universal
re-id model to discriminate between a set of transformed person identity classes. Each of
such classes is formed by applying a variety of random appearance transformations to the images
of that class, where the transformations simulate camera viewing conditions of any domains for
making the model domain generic.
Chapter 5 The third problem considered in this thesis is cross-model KT in coarse-grained
object recognition. This chapter discusses knowledge distillation in image classification. Knowledge
distillation is an effective approach to transfer knowledge from a large teacher neural network
to a small student (target) network for satisfying the low-memory and fast running requirements.
Whilst being able to create stronger target networks compared to the vanilla non-teacher
based learning strategy, this scheme needs to train additionally a large teacher model with expensive
computational cost and requires complex multi-stages training. This chapter firstly presents
a Self-Referenced Deep Learning (SRDL) strategy to accelerate the training process. Unlike
both vanilla optimisation and knowledge distillation, SRDL distils the knowledge discovered
by the in-training target model back to itself for regularising the subsequent learning procedure
therefore eliminating the need for training a large teacher model. Secondly, an On-the-fly Native
Ensemble (ONE) learning strategy for one-stage knowledge distillation is proposed to solve the
weakness of complex multi-stages training. Specifically, ONE only trains a single multi-branch
network while simultaneously establishing a strong teacher on-the-fly to enhance the learning of
target network.
Chapter 6 Forth, this thesis studies the cross-task KT in coarse-grained object recognition.
This chapter focuses on the few-shot classification problem, which aims to train models capable
of recognising new, previously unseen categories from the novel task by using only limited training
samples. Existing metric learning approaches constitute a highly popular strategy, learning
discriminative representations such that images, containing different classes, are well separated
in an embedding space. The commonly held assumption that each class is summarised by a sin5
gle, global representation (referred to as a prototype) that is then used as a reference to infer class
labels brings significant drawbacks. This formulation fails to capture the complex multi-modal
latent distributions that often exist in real-world problems, and yields models that are highly
sensitive to the prototype quality. To address these limitations, this chapter proposes a novel
Mixture of Prototypes (MP) approach that learns multi-modal class representations, and can be
integrated into existing metric based methods. MP models class prototypes as a group of feature
representations carefully designed to be highly diverse and maximise ensembling performance.
Furthermore, this thesis investigates the benefit of incorporating unlabelled data in cross-task
KT, and focuses on the problem of Semi-Supervised Few-shot Learning (SS-FSL). Recent SSFSL
work has relied on popular Semi-Supervised Learning (SSL) concepts, involving iterative
pseudo-labelling, yet often yields models that are susceptible to error propagation and sensitive
to initialisation. To address this limitation, this chapter introduces a novel prototype-based approach
(Fewmatch) for SS-FSL that exploits model Consistency Regularization (CR) in a robust
manner and promotes cross-task unlabelled data knowledge transfer. Fewmatch exploits unlabelled
data via Dynamic Prototype Refinement (DPR) approach, where novel class prototypes
are alternatively refined 1) explicitly, using unlabelled data with high confidence class predictions
and 2) implicitly, by model fine-tuning using a data selective model CR loss. DPR affords
CR convergence, with the explicit refinement providing an increasingly stronger initialisation
and alleviates the issue of error propagation, due to the application of CR.
Chapter 7 draws conclusions and suggests future works that extend the ideas and methods
developed in this thesi
Entity-Oriented Search
This open access book covers all facets of entity-oriented search—where “search” can be interpreted in the broadest sense of information access—from a unified point of view, and provides a coherent and comprehensive overview of the state of the art. It represents the first synthesis of research in this broad and rapidly developing area. Selected topics are discussed in-depth, the goal being to establish fundamental techniques and methods as a basis for future research and development. Additional topics are treated at a survey level only, containing numerous pointers to the relevant literature. A roadmap for future research, based on open issues and challenges identified along the way, rounds out the book. The book is divided into three main parts, sandwiched between introductory and concluding chapters. The first two chapters introduce readers to the basic concepts, provide an overview of entity-oriented search tasks, and present the various types and sources of data that will be used throughout the book. Part I deals with the core task of entity ranking: given a textual query, possibly enriched with additional elements or structural hints, return a ranked list of entities. This core task is examined in a number of different variants, using both structured and unstructured data collections, and numerous query formulations. In turn, Part II is devoted to the role of entities in bridging unstructured and structured data. Part III explores how entities can enable search engines to understand the concepts, meaning, and intent behind the query that the user enters into the search box, and how they can provide rich and focused responses (as opposed to merely a list of documents)—a process known as semantic search. The final chapter concludes the book by discussing the limitations of current approaches, and suggesting directions for future research. Researchers and graduate students are the primary target audience of this book. A general background in information retrieval is sufficient to follow the material, including an understanding of basic probability and statistics concepts as well as a basic knowledge of machine learning concepts and supervised learning algorithms
- …