152 research outputs found

    Exploiting Cross Domain Relationships for Target Recognition

    Get PDF
    Cross domain recognition extracts knowledge from one domain to recognize samples from another domain of interest. The key to solving problems under this umbrella is to find out the latent connections between different domains. In this dissertation, three different cross domain recognition problems are studied by exploiting the relationships between different domains explicitly according to the specific real problems. First, the problem of cross view action recognition is studied. The same action might seem quite different when observed from different viewpoints. Thus, how to use the training samples from a given camera view and perform recognition in another new view is the key point. In this work, reconstructable paths between different views are built to mirror labeled actions from one source view into one another target view for learning an adaptable classifier. The path learning takes advantage of the joint dictionary learning techniques with exploiting hidden information in the seemingly useless samples, making the recognition performance robust and effective. Second, the problem of person re-identification is studied, which tries to match pedestrian images in non-overlapping camera views based on appearance features. In this work, we propose to learn a random kernel forest to discriminatively assign a specific distance metric to each pair of local patches from the two images in matching. The forest is composed by multiple decision trees, which are designed to partition the overall space of local patch-pairs into substantial subspaces, where a simple but effective local metric kernel can be defined to minimize the distance of true matches. Third, the problem of multi-event detection and recognition in smart grid is studied. The signal of multi-event might not be a straightforward combination of some single-event signals because of the correlation among devices. In this work, a concept of ``root-pattern\u27\u27 is proposed that can be extracted from a collection of single-event signals, but also transferable to analyse the constituent components of multi-cascading-event signals based on an over-complete dictionary, which is designed according to the ``root-patterns\u27\u27 with temporal information subtly embedded. The correctness and effectiveness of the proposed approaches have been evaluated by extensive experiments

    PAC-GAN:An Effective Pose Augmentation Scheme for Unsupervised Cross-View Person Re-identification

    Get PDF
    Person re-identification (person Re-Id) aims to retrieve the pedestrian images of the same person that captured by disjoint and non-overlapping cameras. Lots of researchers recently focused on this hot issue and proposed deep learning based methods to enhance the recognition rate in a supervised or unsupervised manner. However,there are two limitations that cannot be ignored: firstly, compared with other image retrieval benchmarks, the size of existing person Re-Id datasets is far from meeting the requirement, which cannot provide sufficient pedestrian samples for the training of deep model; secondly, the samples in existing datasets do not have sufficient human motions or postures coverage to provide more priori knowledges for learning. In this paper, we introduce a novel unsupervised pose augmentation cross-view person Re-Id scheme called PAC-GAN to overcome these limitations. We firstly present the formal definition of cross-view pose augmentation and then propose the framework of PAC-GAN that is a novel conditional generative adversarial network (CGAN) based approach to improve the performance of unsupervised corss-view person Re-Id. Specifically, the pose generation model in PAC-GAN called CPG-Net is to generate enough quantity of pose-rich samples from original image and skeleton samples. The pose augmentation dataset is produced by combining the synthesized pose-rich samples with the original samples, which is fed into the corss-view person Re-Id model named Cross-GAN. Besides, we use weight-sharing strategy in the CPG-Net to improve the quality of new generated samples. To the best of our knowledge, we are the first to enhance the unsupervised cross-view person Re-Id by pose augmentation, and the results of extensive experiments show that the proposed scheme can combat the state-of-the-arts with recognition rate

    Learning Discriminative Features for Person Re-Identification

    Get PDF
    For fulfilling the requirements of public safety in modern cities, more and more large-scale surveillance camera systems are deployed, resulting in an enormous amount of visual data. Automatically processing and interpreting these data promote the development and application of visual data analytic technologies. As one of the important research topics in surveillance systems, person re-identification (re-id) aims at retrieving the target person across non-overlapping camera-views that are implemented in a number of distributed space-time locations. It is a fundamental problem for many practical surveillance applications, eg, person search, cross-camera tracking, multi-camera human behavior analysis and prediction, and it received considerable attentions nowadays from both academic and industrial domains. Learning discriminative feature representation is an essential task in person re-id. Although many methodologies have been proposed, discriminative re-id feature extraction is still a challenging problem due to: (1) Intra- and inter-personal variations. The intrinsic properties of the camera deployment in surveillance system lead to various changes in person poses, view-points, illumination conditions etc. This may result in the large intra-personal variations and/or small inter-personal variations, thus incurring problems in matching person images. (2) Domain variations. The domain variations between different datasets give rise to the problem of generalization capability of re-id model. Directly applying a re-id model trained on one dataset to another one usually causes a large performance degradation. (3) Difficulties in data creation and annotation. Existing person re-id methods, especially deep re-id methods, rely mostly on a large set of inter-camera identity labelled training data, requiring a tedious data collection and annotation process. This leads to poor scalability in practical person re-id applications. Corresponding to the challenges in learning discriminative re-id features, this thesis contributes to the re-id domain by proposing three related methodologies and one new re-id setting: (1) Gaussian mixture importance estimation. Handcrafted features are usually not discriminative enough for person re-id because of noisy information, such as background clutters. To precisely evaluate the similarities between person images, the main task of distance metric learning is to filter out the noisy information. Keep It Simple and Straightforward MEtric (KISSME) is an effective method in person re-id. However, it is sensitive to the feature dimensionality and cannot capture the multi-modes in dataset. To this end, a Gaussian Mixture Importance Estimation re-id approach is proposed, which exploits the Gaussian Mixture Models for estimating the observed commonalities of similar and dissimilar person pairs in the feature space. (2) Unsupervised domain-adaptive person re-id based on pedestrian attributes. In person re-id, person identities are usually not overlapped among different domains (or datasets) and this raises the difficulties in generalizing re-id models. Different from person identity, pedestrian attributes, eg., hair length, clothes type and color, are consistent across different domains (or datasets). However, most of re-id datasets lack attribute annotations. On the other hand, in the field of pedestrian attribute recognition, there is a number of datasets labeled with attributes. Exploiting such data for re-id purpose can alleviate the shortage of attribute annotations in re-id domain and improve the generalization capability of re-id model. To this end, an unsupervised domain-adaptive re-id feature learning framework is proposed to make full use of attribute annotations. Specifically, an existing unsupervised domain adaptation method has been extended to transfer attribute-based features from attribute recognition domain to the re-id domain. With the proposed re-id feature learning framework, the domain invariant feature representations can be effectively extracted. (3) Intra-camera supervised person re-id. Annotating the large-scale re-id datasets requires a tedious data collection and annotation process and therefore leads to poor scalability in practical person re-id applications. To overcome this fundamental limitation, a new person re-id setting is considered without inter-camera identity association but only with identity labels independently annotated within each camera-view. This eliminates the most time-consuming and tedious inter-camera identity association annotating process and thus significantly reduces the amount of human efforts required during annotation. It hence gives rise to a more scalable and more feasible learning scenario, which is named as Intra-Camera Supervised (ICS) person re-id. Under this ICS setting, a new re-id method, i.e., Multi-task Mulit-label (MATE) learning method, is formulated. Given no inter-camera association, MATE is specially designed for self-discovering the inter-camera identity correspondence. This is achieved by inter-camera multi-label learning under a joint multi-task inference framework. In addition, MATE can also efficiently learn the discriminative re-id feature representations using the available identity labels within each camera-view

    Self-supervised Metric Learning

    Get PDF
    H Μάθηση Μετρικής είναι ένα σημαντικό παράδειγμα για μία πληθώρα προβλημάτων της Μηχανικής Μάθησης και της Όρασης Υπολογιστών. Έχει επιτυχημένα εφαρμοστεί σε ε- φαρμογές όπως η λεπτομερής ταξινόμηση, ανάκτηση πληροφορίας, αναγνώριση προσώ- που κ.α. Αφορά την εκμάθηση μιας μετρικής απόστασης που βασίζεται στον προσδιορι- σμό ομοιοτήτων ή ανομοιοτήτων μεταξύ των δειγμάτων. Στόχος της είναι να μειωθεί η απόσταση μεταξύ παρόμοιων δειγμάτων και ταυτόχρονα να αυξηθεί η απόσταση μεταξύ ανόμοιων. Ως εκ τούτου, είναι σημαντικό η μάθηση μετρικής να είναι εκπαιδευόμενη ώστε να προσαρμόζεται σε δεδομένα από διαφορετικούς τομείς. Η εκπαίδευση ενός Συνελικτικού Νευρωνικού Δικτύου ώστε να διακρίνει παρόμοιες από ανόμοιες εικόνες απαιτεί κάποιου είδους επίβλεψη. Στην εποχή του μεγάλου όγκου δεδο- μένων, λόγω του περιορισμένου αριθμού των ανθρωπίνως επισημειωμένων δεδομένων, οι μέθοδοι βαθιάς μάθησης προσαρμόστηκαν να λειτουργούν χωρίς επίβλεψη. Οι αυτοεπιβλεπόμενες μέθοδοι μπορούν να θεωρηθούν ως μια ειδική μορφή μεθόδων μάθησης χωρίς επίβλεψη με εποπτευόμενη μορφή, όπου η εποπτεία πηγάζει από αυτοε- ποπτευόμενες εργασίες και όχι από προκαθορισμένη προηγούμενη γνώση. Σε αντίθεση με μια εντελώς μη επιβλεπόμενη διεργασία, η αυτοεπιβλεπόμενη μάθηση χρησιμοποιεί πληροφορίες από το ίδιο το σύνολο δεδομένων για να δημιουργήσει ψευδο-ετικέτες. Στην παρούσα εργασία εξετάζουμε ορισμένες αυτοεπιβλεπόμενες μεθόδους μετρικής εκ- μάθησης που χρησιμοποιούν διαφορετικές τεχνικές εξόρυξης δειγμάτων καθώς και συ- ναρτήσεις κόστους με σκοπό τη διερεύνηση της αποτελεσματικότητάς τους τόσο στη χρή- ση προεκπαιδευμένου δικτύου στο ImageNet όσο και στην χρήση τυχαία αρχικοποιημέ- νου δικτύου. Η αξιολόγηση των μεθόδων πραγματοποιείται στα πιο διαδεδομένα σύνολα δεδομένων ανάκτησης πληροφορίας και μάθησης μετρικής. Παρατηρείται πως οι ήπιες συναρτήσεις κόστους εκμεταλλεύονται τις ομοιότητες μεταξύ των δειγμάτων λαμβάνοντας υπόψιν τους γείτονές τους, έχουν καλύτερα αποτελέσματα σε σχέση με τις απόλυτες συ- ναρτήσεις κόστους που χρησιμοποιούν τις ομοιότητες κατά ζεύγη. Επιπλέον, φαίνεται πως η τεχνητή επάυξηση των αρχικών εικόνων του συνόλου δεδομένων για την δημιουρ- γία θετικών ζευγών μπορεί να βοηθήσει την αυτοεπιβλεπόμενη μάθηση και ιδιαίτερα στο ξεκίνημά της.Metric learning is an important paradigm for a variety of problems in machine learning and computer vision. It has been successfully employed for fine-grained classification, retrieval, face recognition, person re-identification and few-shot learning, among other tasks. Metric learning is an approach based on a distance metric that aims to determine similarities or dissimilarities between samples. The goal is to reduce the distance between similar samples and at the same time to increase the distance of dissimilar ones. Therefore, it is crucial that the distance measure is learnable to adapt to data from different domains. Training a Convolutional Neural Network to distinguish similar from dissimilar images requires some kind of supervision. In the era of big data, due to limited human-powered annotated data, deep learning methods are recently adapted to work without supervision. Self-supervised methods can be considered as a special form of unsupervised learning methods with a supervised form, where supervision is induced by self-supervised tasks rather than predetermined prior knowledge. Unlike a completely unsupervised setting, self-supervised learning uses information from the dataset itself to generate pseudolabels. In this work we consider some self-supervised metric learning methods which use different sample mining techniques as well as loss functions to investigate its effectiveness in both using pre-trained network on ImageNet and initialized from scratch. The evaluation is performed on four benchmark metric learning and retrieval datasets. It appears that soft loss functions that exploit contextual similarities between samples outperform hard ones that use pairwise similarities. Furthermore, it seems that augmented versions of the original images can be used as positive pairs to initiate the self-supervised training process

    Knowledge Transfer in Object Recognition.

    Get PDF
    PhD Thesis.Abstract Object recognition is a fundamental and long-standing problem in computer vision. Since the latest resurgence of deep learning, thousands of techniques have been proposed and brought to commercial products to facilitate people’s daily life. Although remarkable achievements in object recognition have been witnessed, existing machine learning approaches remain far away from human vision system, especially in learning new concepts and Knowledge Transfer (KT) across scenarios. One main reason is that current learning approaches address isolated tasks by independently training predefined models, without considering any knowledge learned from previous tasks or models. In contrast, humans have an inherent ability to transfer the knowledge acquired from earlier tasks or people to new scenarios. Therefore, to scaling object recognition in realistic deployment, effective KT schemes are required. This thesis studies several aspects of KT for scaling object recognition systems. Specifically, to facilitate the KT process, several mechanisms on fine-grained and coarse-grained object recognition tasks are analyzed and studied, including 1) cross-class KT on person re-identification (reid); 2) cross-domain KT on person re-identification; 3) cross-model KT on image classification; 4) cross-task KT on image classification. In summary, four types of knowledge transfer schemes are discussed as follows: Chapter 3 Cross-class KT in person re-identification, one of representative fine-grained object recognition tasks, is firstly investigated. The nature of person identity classes for person re-id are totally disjoint between training and testing (a zero-shot learning problem), resulting in the highly demand of cross-class KT. To solve that, existing person re-id approaches aim to derive a feature representation for pairwise similarity based matching and ranking, which is able to generalise to test. However, current person re-id methods assume the provision of accurately cropped person bounding boxes and each of them is in the same resolution, ignoring the impact of the background noise and variant scale of images to cross-class KT. This is more severed in practice when person bounding boxes must be detected automatically given a very large number of images and/or videos (un-constrained scene images) processed. To address these challenges, this chapter provides two novel approaches, aiming to promote cross-class KT and boost re-id performance. 1) This chapter alleviates inaccurate person bounding box by developing a joint learning deep model that optimises person re-id attention selection within any auto-detected person bounding boxes by reinforcement learning of background clutter minimisation. Specifically, this chapter formulates a novel unified re-id architecture called Identity DiscriminativE Attention reinforcement Learning (IDEAL) to accurately select re-id attention in auto-detected bounding boxes for optimising re-id performance. 2) This chapter addresses multi-scale problem by proposing a Cross-Level Semantic Alignment (CLSA) deep learning approach capable of learning more discriminative identity feature representations in a unified end-to-end model. This 4 is realised by exploiting the in-network feature pyramid structure of a deep neural network enhanced by a novel cross pyramid-level semantic alignment loss function. Extensive experiments show the modelling advantages and performance superiority of both IDEAL and CLSA over the state-of-the-art re-id methods on widely used benchmarking datasets. Chapter 4 In this chapter, we address the problem of cross-domain KT in unsupervised domain adaptation for person re-id. Specifically, this chapter considers cross-domain KT as follows: 1) Unsupervised domain adaptation: “train once, run once” pattern, transferring knowledge from source domain to specific target domain and the model is restricted to be applied on target domain only; 2) Universal re-id: “train once, run everywhere” pattern, transferring knowledge from source domain to any target domains, and therefore is capable of deploying any domains of re-id task. This chapter firstly develops a novel Hierarchical Unsupervised Domain Adaptation (HUDA) method for unsupervised domain adaptation for re-id. It can automatically transfer labelled information of an existing dataset (a source domain) to an unlabelled target domain for unsupervised person re-id. Specifically, HUDA is designed to model jointly global distribution alignment and local instance alignment in a two-level hierarchy for discovering transferable source knowledge in unsupervised domain adaptation. Crucially, this approach aims to overcome the under-constrained learning problem of existing unsupervised domain adaptation methods, lacking of the local instance alignment constraint. The consequence is more effective and cross-domain KT from the labelled source domain to the unlabelled target domain. This chapter further addresses the limitation of “train once, run once ” for existing domain adaptation person re-id approaches by presenting a novel “train once, run everywhere” pattern. This conventional “train once, run once” pattern is unscalable to a large number of target domains typically encountered in real-world deployments, due to the requirement of training a separate model for each target domain as supervised learning methods. To mitigate this weakness, a novel “Universal Model Learning” (UML) approach is formulated to enable domain-generic person re-id using only limited training data of a “single” seed domain. Specifically, UML trains a universal re-id model to discriminate between a set of transformed person identity classes. Each of such classes is formed by applying a variety of random appearance transformations to the images of that class, where the transformations simulate camera viewing conditions of any domains for making the model domain generic. Chapter 5 The third problem considered in this thesis is cross-model KT in coarse-grained object recognition. This chapter discusses knowledge distillation in image classification. Knowledge distillation is an effective approach to transfer knowledge from a large teacher neural network to a small student (target) network for satisfying the low-memory and fast running requirements. Whilst being able to create stronger target networks compared to the vanilla non-teacher based learning strategy, this scheme needs to train additionally a large teacher model with expensive computational cost and requires complex multi-stages training. This chapter firstly presents a Self-Referenced Deep Learning (SRDL) strategy to accelerate the training process. Unlike both vanilla optimisation and knowledge distillation, SRDL distils the knowledge discovered by the in-training target model back to itself for regularising the subsequent learning procedure therefore eliminating the need for training a large teacher model. Secondly, an On-the-fly Native Ensemble (ONE) learning strategy for one-stage knowledge distillation is proposed to solve the weakness of complex multi-stages training. Specifically, ONE only trains a single multi-branch network while simultaneously establishing a strong teacher on-the-fly to enhance the learning of target network. Chapter 6 Forth, this thesis studies the cross-task KT in coarse-grained object recognition. This chapter focuses on the few-shot classification problem, which aims to train models capable of recognising new, previously unseen categories from the novel task by using only limited training samples. Existing metric learning approaches constitute a highly popular strategy, learning discriminative representations such that images, containing different classes, are well separated in an embedding space. The commonly held assumption that each class is summarised by a sin5 gle, global representation (referred to as a prototype) that is then used as a reference to infer class labels brings significant drawbacks. This formulation fails to capture the complex multi-modal latent distributions that often exist in real-world problems, and yields models that are highly sensitive to the prototype quality. To address these limitations, this chapter proposes a novel Mixture of Prototypes (MP) approach that learns multi-modal class representations, and can be integrated into existing metric based methods. MP models class prototypes as a group of feature representations carefully designed to be highly diverse and maximise ensembling performance. Furthermore, this thesis investigates the benefit of incorporating unlabelled data in cross-task KT, and focuses on the problem of Semi-Supervised Few-shot Learning (SS-FSL). Recent SSFSL work has relied on popular Semi-Supervised Learning (SSL) concepts, involving iterative pseudo-labelling, yet often yields models that are susceptible to error propagation and sensitive to initialisation. To address this limitation, this chapter introduces a novel prototype-based approach (Fewmatch) for SS-FSL that exploits model Consistency Regularization (CR) in a robust manner and promotes cross-task unlabelled data knowledge transfer. Fewmatch exploits unlabelled data via Dynamic Prototype Refinement (DPR) approach, where novel class prototypes are alternatively refined 1) explicitly, using unlabelled data with high confidence class predictions and 2) implicitly, by model fine-tuning using a data selective model CR loss. DPR affords CR convergence, with the explicit refinement providing an increasingly stronger initialisation and alleviates the issue of error propagation, due to the application of CR. Chapter 7 draws conclusions and suggests future works that extend the ideas and methods developed in this thesi

    Entity-Oriented Search

    Get PDF
    This open access book covers all facets of entity-oriented search—where “search” can be interpreted in the broadest sense of information access—from a unified point of view, and provides a coherent and comprehensive overview of the state of the art. It represents the first synthesis of research in this broad and rapidly developing area. Selected topics are discussed in-depth, the goal being to establish fundamental techniques and methods as a basis for future research and development. Additional topics are treated at a survey level only, containing numerous pointers to the relevant literature. A roadmap for future research, based on open issues and challenges identified along the way, rounds out the book. The book is divided into three main parts, sandwiched between introductory and concluding chapters. The first two chapters introduce readers to the basic concepts, provide an overview of entity-oriented search tasks, and present the various types and sources of data that will be used throughout the book. Part I deals with the core task of entity ranking: given a textual query, possibly enriched with additional elements or structural hints, return a ranked list of entities. This core task is examined in a number of different variants, using both structured and unstructured data collections, and numerous query formulations. In turn, Part II is devoted to the role of entities in bridging unstructured and structured data. Part III explores how entities can enable search engines to understand the concepts, meaning, and intent behind the query that the user enters into the search box, and how they can provide rich and focused responses (as opposed to merely a list of documents)—a process known as semantic search. The final chapter concludes the book by discussing the limitations of current approaches, and suggesting directions for future research. Researchers and graduate students are the primary target audience of this book. A general background in information retrieval is sufficient to follow the material, including an understanding of basic probability and statistics concepts as well as a basic knowledge of machine learning concepts and supervised learning algorithms

    Asymmetric Projection and Dictionary Learning With Listwise and Identity Consistency Constraints for Person Re-Identification

    No full text
    corecore