11 research outputs found

    Person Re-identification in Identity Regression Space

    Get PDF
    This work was partially supported by the China Scholarship Council, Vision Semantics Ltd, Royal Society Newton Advanced Fellowship Programme (NA150459), and Innovate UK Industrial Challenge Project on Developing and Commercialising Intelligent Video Analytics Solutions for Public Safety (98111-571149)

    Re-Identification of Zebrafish using Metric Learning

    Get PDF

    Representation Learning with Adversarial Latent Autoencoders

    Get PDF
    A large number of deep learning methods applied to computer vision problems require encoder-decoder maps. These methods include, but are not limited to, self-representation learning, generalization, few-shot learning, and novelty detection. Encoder-decoder maps are also useful for photo manipulation, photo editing, superresolution, etc. Encoder-decoder maps are typically learned using autoencoder networks.Traditionally, autoencoder reciprocity is achieved in the image-space using pixel-wisesimilarity loss, which has a widely known flaw of producing non-realistic reconstructions. This flaw is typical for the Variational Autoencoder (VAE) family and is not only limited to pixel-wise similarity losses, but is common to all methods relying upon the explicit maximum likelihood training paradigm, as opposed to an implicit one. Likelihood maximization, coupled with poor decoder distribution leads to poor or blurry reconstructions at best. Generative Adversarial Networks (GANs) on the other hand, perform an implicit maximization of the likelihood by solving a minimax game, thus bypassing the issues derived from the explicit maximization. This provides GAN architectures with remarkable generative power, enabling the generation of high-resolution images of humans, which are indistinguishable from real photos to the naked eye. However, GAN architectures lack inference capabilities, which makes them unsuitable for training encoder-decoder maps, effectively limiting their application space.We introduce an autoencoder architecture that (a) is free from the consequences ofmaximizing the likelihood directly, (b) produces reconstructions competitive in quality with state-of-the-art GAN architectures, and (c) allows learning disentangled representations, which makes it useful in a variety of problems. We show that the proposed architecture and training paradigm significantly improves the state-of-the-art in novelty and anomaly detection methods, it enables novel kinds of image manipulations, and has significant potential for other applications

    Reconnaissance Biométrique par Fusion Multimodale de Visages

    Get PDF
    Biometric systems are considered to be one of the most effective methods of protecting and securing private or public life against all types of theft. Facial recognition is one of the most widely used methods, not because it is the most efficient and reliable, but rather because it is natural and non-intrusive and relatively accepted compared to other biometrics such as fingerprint and iris. The goal of developing biometric applications, such as facial recognition, has recently become important in smart cities. Over the past decades, many techniques, the applications of which include videoconferencing systems, facial reconstruction, security, etc. proposed to recognize a face in a 2D or 3D image. Generally, the change in lighting, variations in pose and facial expressions make 2D facial recognition less than reliable. However, 3D models may be able to overcome these constraints, except that most 3D facial recognition methods still treat the human face as a rigid object. This means that these methods are not able to handle facial expressions. In this thesis, we propose a new approach for automatic face verification by encoding the local information of 2D and 3D facial images as a high order tensor. First, the histograms of two local multiscale descriptors (LPQ and BSIF) are used to characterize both 2D and 3D facial images. Next, a tensor-based facial representation is designed to combine all the features extracted from 2D and 3D faces. Moreover, to improve the discrimination of the proposed tensor face representation, we used two multilinear subspace methods (MWPCA and MDA combined with WCCN). In addition, the WCCN technique is applied to face tensors to reduce the effect of intra-class directions using a normalization transform, as well as to improve the discriminating power of MDA. Our experiments were carried out on the three largest databases: FRGC v2.0, Bosphorus and CASIA 3D under different facial expressions, variations in pose and occlusions. The experimental results have shown the superiority of the proposed approach in terms of verification rate compared to the recent state-of-the-art method

    Deep Visual Feature Learning for Vehicle Detection, Recognition and Re-identification

    Get PDF
    Along with the ever-increasing number of motor vehicles in current transportation systems, intelligent video surveillance and management becomes more necessary which is one of the important artificial intelligence fields. Vehicle-related problems are being widely explored and applied practically. Among various techniques, computer vision and machine learning algorithms have been the most popular ones since a vast of video/image surveillance data are available for research, nowadays. In this thesis, vision-based approaches for vehicle detection, recognition, and re-identification are extensively investigated. Moreover, to address different challenges, several novel methods are proposed to overcome weaknesses of previous works and achieve compelling performance. Deep visual feature learning has been widely researched in the past five years and obtained huge progress in many applications including image classification, image retrieval, object detection, image segmentation and image generation. Compared with traditional machine learning methods which consist of hand-crafted feature extraction and shallow model learning, deep neural networks can learn hierarchical feature representations from low-level to high-level features to get more robust recognition precision. For some specific tasks, researchers prefer to embed feature learning and classification/regression methods into end-to-end models, which can benefit both the accuracy and efficiency. In this thesis, deep models are mainly investigated to study the research problems. Vehicle detection is the most fundamental task in intelligent video surveillance but faces many challenges such as severe illumination and viewpoint variations, occlusions and multi-scale problems. Moreover, learning vehicles’ diverse attributes is also an interesting and valuable problem. To address these tasks and their difficulties, a fast framework of Detection and Annotation for Vehicles (DAVE) is presented, which effectively combines vehicle detection and attributes annotation. DAVE consists of two convolutional neural networks (CNNs): afastvehicleproposalnetwork(FVPN)forvehicle-likeobjectsextraction and an attributes learning network (ALN) aiming to verify each proposal and infer each vehicle’s pose, color and type simultaneously. These two nets are jointly optimized so that the abundant latent knowledge learned from the ALN can be exploited to guide FVPN training. Once the model is trained, it can achieve efficient vehicle detection and annotation for real-world traffic surveillance data. The second research problem of the thesis focuses on vehicle re-identification (re-ID). Vehicle re-ID aims to identify a target vehicle in different cameras with non-overlapping views. It has received far less attention in the computer vision community than the prevalent person re-ID problem. Possible reasons for this slow progress are the lack of appropriate research data and the special 3D structure of a vehicle. Previous works have generally focused on some specific views (e.g. front), but these methods are less effective in realistic scenarios where vehicles usually appear in arbitrary view points to cameras. In this thesis, I focus on the uncertainty of vehicle viewpoint in re-ID, proposing four different approaches to address the multi-view vehicle re-ID problem: (1) The Spatially Concatenated ConvNet (SCCN) in an encoder-decoder architecture is proposed to learn transformations across different viewpoints of a vehicle, and then spatially concatenate all the feature maps for further fusing them into a multi-view feature representation. (2) A Cross-View Generative Adversarial Network (XVGAN)is designed to take an input image’s feature as conditional embedding to effectively infer cross-view images. The features of the inferred and original images are combined to learn distance metrics for re-ID.(3)The great advantages of a bi-directional Long Short-Term Memory (LSTM) loop are investigated of modeling transformations across continuous view variation of a vehicle. (4) A Viewpoint-aware Attentive Multi-view Inference (VAMI) model is proposed, adopting a viewpoint-aware attention model to select core regions at different viewpoints and then performing multi-view feature inference by an adversarial training architecture

    LEARNING FROM INCOMPLETE AND HETEROGENEOUS DATA

    Get PDF
    Deep convolutional neural networks (DCNNs) have shown impressive performance improvements for object detection and recognition problems. However, a vast majority of DCNN-based recognition methods are designed with two key assumptions in mind, i.e., 1) the assumption that all categories are known a priori and 2) both training and test data are drawn from a similar distribution. However, in many real-world applications, these assumptions do not necessarily hold and limit the generalization capability of a recognition model. Generally, incomplete knowledge of the world is present at training time, and unknown classes can be submitted to an algorithm during testing. If the visual system is trained assuming that all categories are known a priori, it would fail to identify these cases with unknown classes during testing. Ideally, the goal of a visual recognition system would be to reject samples from unknown classes and classify samples from known classes. In this thesis, we consider this constraint and evaluate visual recognition systems under two problem settings, i.e., one-class and multi-class novelty detection. In the one-class setting, the goal is to learn a visual recognition system from a single category and reject any other category samples as unknown during testing. Whereas, in multi-class classification the visual recognition system aims to learn from multiple-categories and reject any other category sample that is not part of the training category set as unknown. With experiments on multiple benchmark datasets we show that the proposed recognition systems are able to perform better compared to existing approaches. Furthermore, we also recognize that in many real world conditions training and testing data distributions are often different. Due to this, the performance of a visual recognition system drops significantly. This is commonly referred to as dataset bias or domain-shift which can be addressed using domain adaptation. In particular, we address unsupervised domain adaptation in which the idea is to utilize an additional set of unlabeled data sampled from a particular domain to help improve the performance in that respective domain. Various experiments on multiple domain adaptation benchmarks show that the proposed strategy is able to generalize better compared to existing methods in the literature

    Technology 2001: The Second National Technology Transfer Conference and Exposition, volume 2

    Get PDF
    Proceedings of the workshop are presented. The mission of the conference was to transfer advanced technologies developed by the Federal government, its contractors, and other high-tech organizations to U.S. industries for their use in developing new or improved products and processes. Volume two presents papers on the following topics: materials science, robotics, test and measurement, advanced manufacturing, artificial intelligence, biotechnology, electronics, and software engineering

    Cognitive Foundations for Visual Analytics

    Full text link
    corecore