13,887 research outputs found

    What-and-Where to Match: Deep Spatially Multiplicative Integration Networks for Person Re-identification

    Full text link
    Matching pedestrians across disjoint camera views, known as person re-identification (re-id), is a challenging problem that is of importance to visual recognition and surveillance. Most existing methods exploit local regions within spatial manipulation to perform matching in local correspondence. However, they essentially extract \emph{fixed} representations from pre-divided regions for each image and perform matching based on the extracted representation subsequently. For models in this pipeline, local finer patterns that are crucial to distinguish positive pairs from negative ones cannot be captured, and thus making them underperformed. In this paper, we propose a novel deep multiplicative integration gating function, which answers the question of \emph{what-and-where to match} for effective person re-id. To address \emph{what} to match, our deep network emphasizes common local patterns by learning joint representations in a multiplicative way. The network comprises two Convolutional Neural Networks (CNNs) to extract convolutional activations, and generates relevant descriptors for pedestrian matching. This thus, leads to flexible representations for pair-wise images. To address \emph{where} to match, we combat the spatial misalignment by performing spatially recurrent pooling via a four-directional recurrent neural network to impose spatial dependency over all positions with respect to the entire image. The proposed network is designed to be end-to-end trainable to characterize local pairwise feature interactions in a spatially aligned manner. To demonstrate the superiority of our method, extensive experiments are conducted over three benchmark data sets: VIPeR, CUHK03 and Market-1501.Comment: Published at Pattern Recognition, Elsevie

    Triplet-based Deep Similarity Learning for Person Re-Identification

    Full text link
    In recent years, person re-identification (re-id) catches great attention in both computer vision community and industry. In this paper, we propose a new framework for person re-identification with a triplet-based deep similarity learning using convolutional neural networks (CNNs). The network is trained with triplet input: two of them have the same class labels and the other one is different. It aims to learn the deep feature representation, with which the distance within the same class is decreased, while the distance between the different classes is increased as much as possible. Moreover, we trained the model jointly on six different datasets, which differs from common practice - one model is just trained on one dataset and tested also on the same one. However, the enormous number of possible triplet data among the large number of training samples makes the training impossible. To address this challenge, a double-sampling scheme is proposed to generate triplets of images as effective as possible. The proposed framework is evaluated on several benchmark datasets. The experimental results show that, our method is effective for the task of person re-identification and it is comparable or even outperforms the state-of-the-art methods.Comment: ICCV Workshops 201

    Temporal Continuity Based Unsupervised Learning for Person Re-Identification

    Full text link
    Person re-identification (re-id) aims to match the same person from images taken across multiple cameras. Most existing person re-id methods generally require a large amount of identity labeled data to act as discriminative guideline for representation learning. Difficulty in manually collecting identity labeled data leads to poor adaptability in practical scenarios. To overcome this problem, we propose an unsupervised center-based clustering approach capable of progressively learning and exploiting the underlying re-id discriminative information from temporal continuity within a camera. We call our framework Temporal Continuity based Unsupervised Learning (TCUL). Specifically, TCUL simultaneously does center based clustering of unlabeled (target) dataset and fine-tunes a convolutional neural network (CNN) pre-trained on irrelevant labeled (source) dataset to enhance discriminative capability of the CNN for the target dataset. Furthermore, it exploits temporally continuous nature of images within-camera jointly with spatial similarity of feature maps across-cameras to generate reliable pseudo-labels for training a re-identification model. As the training progresses, number of reliable samples keep on growing adaptively which in turn boosts representation ability of the CNN. Extensive experiments on three large-scale person re-id benchmark datasets are conducted to compare our framework with state-of-the-art techniques, which demonstrate superiority of TCUL over existing methods

    A Pose-Sensitive Embedding for Person Re-Identification with Expanded Cross Neighborhood Re-Ranking

    Full text link
    Person re identification is a challenging retrieval task that requires matching a person's acquired image across non overlapping camera views. In this paper we propose an effective approach that incorporates both the fine and coarse pose information of the person to learn a discriminative embedding. In contrast to the recent direction of explicitly modeling body parts or correcting for misalignment based on these, we show that a rather straightforward inclusion of acquired camera view and/or the detected joint locations into a convolutional neural network helps to learn a very effective representation. To increase retrieval performance, re-ranking techniques based on computed distances have recently gained much attention. We propose a new unsupervised and automatic re-ranking framework that achieves state-of-the-art re-ranking performance. We show that in contrast to the current state-of-the-art re-ranking methods our approach does not require to compute new rank lists for each image pair (e.g., based on reciprocal neighbors) and performs well by using simple direct rank list based comparison or even by just using the already computed euclidean distances between the images. We show that both our learned representation and our re-ranking method achieve state-of-the-art performance on a number of challenging surveillance image and video datasets. The code is available online at: https://github.com/pse-ecn/pose-sensitive-embeddingComment: CVPR 2018: v2 (fixes, added new results on PRW dataset

    CMTR: Cross-modality Transformer for Visible-infrared Person Re-identification

    Full text link
    Visible-infrared cross-modality person re-identification is a challenging ReID task, which aims to retrieve and match the same identity's images between the heterogeneous visible and infrared modalities. Thus, the core of this task is to bridge the huge gap between these two modalities. The existing convolutional neural network-based methods mainly face the problem of insufficient perception of modalities' information, and can not learn good discriminative modality-invariant embeddings for identities, which limits their performance. To solve these problems, we propose a cross-modality transformer-based method (CMTR) for the visible-infrared person re-identification task, which can explicitly mine the information of each modality and generate better discriminative features based on it. Specifically, to capture modalities' characteristics, we design the novel modality embeddings, which are fused with token embeddings to encode modalities' information. Furthermore, to enhance representation of modality embeddings and adjust matching embeddings' distribution, we propose a modality-aware enhancement loss based on the learned modalities' information, reducing intra-class distance and enlarging inter-class distance. To our knowledge, this is the first work of applying transformer network to the cross-modality re-identification task. We implement extensive experiments on the public SYSU-MM01 and RegDB datasets, and our proposed CMTR model's performance significantly surpasses existing outstanding CNN-based methods.Comment: 11 pages, 7 figures, 7 table

    Deep Representation Learning for Vehicle Re-Identification

    Get PDF
    With the widespread use of surveillance cameras in cities and on motorways, computer vision based intelligent systems are becoming a standard in the industry. Vehicle related problems such as Automatic License Plate Recognition have been addressed by computer vision systems, albeit in controlled settings (e.g.cameras installed at toll gates). Due to the freely available research data becoming available in the last few years, surveillance footage analysis for vehicle related problems are being studied with a computer vision focus. In this thesis, vision-based approaches for the problem of vehicle re-identification are investigated and original approaches are presented for various challenges of the problem. Computer vision based systems have advanced considerably in the last decade due to rapid improvements in machine learning with the advent of deep learning and convolutional neural networks (CNNs). At the core of the paradigm shift that has arrived with deep learning in machine learning is feature learning by multiple stacked neural network layers. Compared to traditional machine learning methods that utilise hand-crafted feature extraction and shallow model learning, deep neural networks can learn hierarchical feature representations as input data transform from low-level to high-level representation through consecutive neural network layers. Furthermore, machine learning tasks are trained in an end-to-end fashion that integrates feature extraction and machine learning methods into a combined framework using neural networks. This thesis focuses on visual feature learning with deep convolutional neural networks for the vehicle re-identification problem. The problem of re-identification has attracted attention from the computer vision community, especially for the person re-identification domain, whereas vehicle re-identification is relatively understudied. Re-identification is the problem of matching identities of subjects in images. The images come from non-overlapping viewing angles captured at varying locations, illuminations, etc. Compared to person re-identification, vehicle reidentification is particularly challenging as vehicles are manufactured to have the same visual appearance and shape that makes different instances visually indistinguishable. This thesis investigates solutions for the aforementioned challenges and makes the following contributions, improving accuracy and robustness of recent approaches. The contributions are the following: (1) Exploring the man-made nature of vehicles, that is, their hierarchical categories such as type (e.g.sedan, SUV) and model (e.g.Audi-2011-A4) and its usefulness in identity matching when identity pairwise labelling is not present (2) A new vehicle re-identification benchmark, Vehicle Re-Identification in Context (VRIC), is introduced to enable the design and evaluation of vehicle re-id methods to more closely reflect real-world application conditions compared to existing benchmarks. VRIC is uniquely characterised by unconstrained vehicle images in low resolution; from wide field of view traffic scene videos exhibiting variations of illumination, motion blur,and occlusion. (3) We evaluate the advantages of Multi-Scale Visual Representation (MSVR) in multi-scale cross-camera matching performance by training a multi-branch CNN model for vehicle re-identification enabled by the availability of low resolution images in VRIC. Experimental results indicate that this approach is useful in real-world settings where image resolution is low and varying across cameras. (4) With Multi-Task Mutual Learning (MTML) we propose a multi-modal learning representation e.g.using orientation as well as identity labels in training. We utilise deep convolutional neural networks with multiple branches to facilitate the learning of multi-modal and multi-scale deep features that increase re-identification performance, as well as orientation invariant feature learning
    • …
    corecore