109,512 research outputs found

    Person Re-Identification by Deep Learning Multi-Scale Representations

    Get PDF

    Multi-scale Deep Learning Architectures for Person Re-identification

    Full text link
    Person Re-identification (re-id) aims to match people across non-overlapping camera views in a public space. It is a challenging problem because many people captured in surveillance videos wear similar clothes. Consequently, the differences in their appearance are often subtle and only detectable at the right location and scales. Existing re-id models, particularly the recently proposed deep learning based ones match people at a single scale. In contrast, in this paper, a novel multi-scale deep learning model is proposed. Our model is able to learn deep discriminative feature representations at different scales and automatically determine the most suitable scales for matching. The importance of different spatial locations for extracting discriminative features is also learned explicitly. Experiments are carried out to demonstrate that the proposed model outperforms the state-of-the art on a number of benchmarksComment: 9 pages, 3 figures, accepted by ICCV 201

    Intra-Camera Supervised Person Re-Identification: A New Benchmark

    Get PDF
    Existing person re-identification (re-id) methods rely mostly on a large set of inter-camera identity labelled training data, requiring a tedious data collection and annotation process therefore leading to poor scalability in practical re-id applications. To overcome this fundamental limitation, we consider person re-identification without inter-camera identity association but only with identity labels independently annotated within each individual camera-view. This eliminates the most time-consuming and tedious inter-camera identity labelling process in order to significantly reduce the amount of human efforts required during annotation. It hence gives rise to a more scalable and more feasible learning scenario, which we call Intra-Camera Supervised (ICS) person re-id. Under this ICS setting with weaker label supervision, we formulate a Multi-Task Multi-Label (MTML) deep learning method. Given no inter-camera association, MTML is specially designed for self-discovering the inter-camera identity correspondence. This is achieved by inter-camera multi-label learning under a joint multi-task inference framework. In addition, MTML can also efficiently learn the discriminative re-id feature representations by fully using the available identity labels within each camera-view. Extensive experiments demonstrate the performance superiority of our MTML model over the state-of-the-art alternative methods on three large-scale person re-id datasets in the proposed intra-camera supervised learning setting.Comment: 9 pages, 3 figures, accepted by ICCV Workshop on Real-World Recognition from Low-Quality Images and Videos, 201

    Deep Representation Learning for Vehicle Re-Identification

    Get PDF
    With the widespread use of surveillance cameras in cities and on motorways, computer vision based intelligent systems are becoming a standard in the industry. Vehicle related problems such as Automatic License Plate Recognition have been addressed by computer vision systems, albeit in controlled settings (e.g.cameras installed at toll gates). Due to the freely available research data becoming available in the last few years, surveillance footage analysis for vehicle related problems are being studied with a computer vision focus. In this thesis, vision-based approaches for the problem of vehicle re-identification are investigated and original approaches are presented for various challenges of the problem. Computer vision based systems have advanced considerably in the last decade due to rapid improvements in machine learning with the advent of deep learning and convolutional neural networks (CNNs). At the core of the paradigm shift that has arrived with deep learning in machine learning is feature learning by multiple stacked neural network layers. Compared to traditional machine learning methods that utilise hand-crafted feature extraction and shallow model learning, deep neural networks can learn hierarchical feature representations as input data transform from low-level to high-level representation through consecutive neural network layers. Furthermore, machine learning tasks are trained in an end-to-end fashion that integrates feature extraction and machine learning methods into a combined framework using neural networks. This thesis focuses on visual feature learning with deep convolutional neural networks for the vehicle re-identification problem. The problem of re-identification has attracted attention from the computer vision community, especially for the person re-identification domain, whereas vehicle re-identification is relatively understudied. Re-identification is the problem of matching identities of subjects in images. The images come from non-overlapping viewing angles captured at varying locations, illuminations, etc. Compared to person re-identification, vehicle reidentification is particularly challenging as vehicles are manufactured to have the same visual appearance and shape that makes different instances visually indistinguishable. This thesis investigates solutions for the aforementioned challenges and makes the following contributions, improving accuracy and robustness of recent approaches. The contributions are the following: (1) Exploring the man-made nature of vehicles, that is, their hierarchical categories such as type (e.g.sedan, SUV) and model (e.g.Audi-2011-A4) and its usefulness in identity matching when identity pairwise labelling is not present (2) A new vehicle re-identification benchmark, Vehicle Re-Identification in Context (VRIC), is introduced to enable the design and evaluation of vehicle re-id methods to more closely reflect real-world application conditions compared to existing benchmarks. VRIC is uniquely characterised by unconstrained vehicle images in low resolution; from wide field of view traffic scene videos exhibiting variations of illumination, motion blur,and occlusion. (3) We evaluate the advantages of Multi-Scale Visual Representation (MSVR) in multi-scale cross-camera matching performance by training a multi-branch CNN model for vehicle re-identification enabled by the availability of low resolution images in VRIC. Experimental results indicate that this approach is useful in real-world settings where image resolution is low and varying across cameras. (4) With Multi-Task Mutual Learning (MTML) we propose a multi-modal learning representation e.g.using orientation as well as identity labels in training. We utilise deep convolutional neural networks with multiple branches to facilitate the learning of multi-modal and multi-scale deep features that increase re-identification performance, as well as orientation invariant feature learning

    Learning Discriminative Features for Person Re-Identification

    Get PDF
    For fulfilling the requirements of public safety in modern cities, more and more large-scale surveillance camera systems are deployed, resulting in an enormous amount of visual data. Automatically processing and interpreting these data promote the development and application of visual data analytic technologies. As one of the important research topics in surveillance systems, person re-identification (re-id) aims at retrieving the target person across non-overlapping camera-views that are implemented in a number of distributed space-time locations. It is a fundamental problem for many practical surveillance applications, eg, person search, cross-camera tracking, multi-camera human behavior analysis and prediction, and it received considerable attentions nowadays from both academic and industrial domains. Learning discriminative feature representation is an essential task in person re-id. Although many methodologies have been proposed, discriminative re-id feature extraction is still a challenging problem due to: (1) Intra- and inter-personal variations. The intrinsic properties of the camera deployment in surveillance system lead to various changes in person poses, view-points, illumination conditions etc. This may result in the large intra-personal variations and/or small inter-personal variations, thus incurring problems in matching person images. (2) Domain variations. The domain variations between different datasets give rise to the problem of generalization capability of re-id model. Directly applying a re-id model trained on one dataset to another one usually causes a large performance degradation. (3) Difficulties in data creation and annotation. Existing person re-id methods, especially deep re-id methods, rely mostly on a large set of inter-camera identity labelled training data, requiring a tedious data collection and annotation process. This leads to poor scalability in practical person re-id applications. Corresponding to the challenges in learning discriminative re-id features, this thesis contributes to the re-id domain by proposing three related methodologies and one new re-id setting: (1) Gaussian mixture importance estimation. Handcrafted features are usually not discriminative enough for person re-id because of noisy information, such as background clutters. To precisely evaluate the similarities between person images, the main task of distance metric learning is to filter out the noisy information. Keep It Simple and Straightforward MEtric (KISSME) is an effective method in person re-id. However, it is sensitive to the feature dimensionality and cannot capture the multi-modes in dataset. To this end, a Gaussian Mixture Importance Estimation re-id approach is proposed, which exploits the Gaussian Mixture Models for estimating the observed commonalities of similar and dissimilar person pairs in the feature space. (2) Unsupervised domain-adaptive person re-id based on pedestrian attributes. In person re-id, person identities are usually not overlapped among different domains (or datasets) and this raises the difficulties in generalizing re-id models. Different from person identity, pedestrian attributes, eg., hair length, clothes type and color, are consistent across different domains (or datasets). However, most of re-id datasets lack attribute annotations. On the other hand, in the field of pedestrian attribute recognition, there is a number of datasets labeled with attributes. Exploiting such data for re-id purpose can alleviate the shortage of attribute annotations in re-id domain and improve the generalization capability of re-id model. To this end, an unsupervised domain-adaptive re-id feature learning framework is proposed to make full use of attribute annotations. Specifically, an existing unsupervised domain adaptation method has been extended to transfer attribute-based features from attribute recognition domain to the re-id domain. With the proposed re-id feature learning framework, the domain invariant feature representations can be effectively extracted. (3) Intra-camera supervised person re-id. Annotating the large-scale re-id datasets requires a tedious data collection and annotation process and therefore leads to poor scalability in practical person re-id applications. To overcome this fundamental limitation, a new person re-id setting is considered without inter-camera identity association but only with identity labels independently annotated within each camera-view. This eliminates the most time-consuming and tedious inter-camera identity association annotating process and thus significantly reduces the amount of human efforts required during annotation. It hence gives rise to a more scalable and more feasible learning scenario, which is named as Intra-Camera Supervised (ICS) person re-id. Under this ICS setting, a new re-id method, i.e., Multi-task Mulit-label (MATE) learning method, is formulated. Given no inter-camera association, MATE is specially designed for self-discovering the inter-camera identity correspondence. This is achieved by inter-camera multi-label learning under a joint multi-task inference framework. In addition, MATE can also efficiently learn the discriminative re-id feature representations using the available identity labels within each camera-view

    Learning Deep Context-aware Features over Body and Latent Parts for Person Re-identification

    Full text link
    Person Re-identification (ReID) is to identify the same person across different cameras. It is a challenging task due to the large variations in person pose, occlusion, background clutter, etc How to extract powerful features is a fundamental problem in ReID and is still an open problem today. In this paper, we design a Multi-Scale Context-Aware Network (MSCAN) to learn powerful features over full body and body parts, which can well capture the local context knowledge by stacking multi-scale convolutions in each layer. Moreover, instead of using predefined rigid parts, we propose to learn and localize deformable pedestrian parts using Spatial Transformer Networks (STN) with novel spatial constraints. The learned body parts can release some difficulties, eg pose variations and background clutters, in part-based representation. Finally, we integrate the representation learning processes of full body and body parts into a unified framework for person ReID through multi-class person identification tasks. Extensive evaluations on current challenging large-scale person ReID datasets, including the image-based Market1501, CUHK03 and sequence-based MARS datasets, show that the proposed method achieves the state-of-the-art results.Comment: Accepted by CVPR 201
    corecore