Search CORE

158,291 research outputs found

Semantics-Aligned Representation Learning for Person Re-identification

Author: Chen Zhibo
Jin Xin
Lan Cuiling
Wei Guoqiang
Zeng Wenjun
Publication venue
Publication date: 18/03/2020
Field of study

Person re-identification (reID) aims to match person images to retrieve the ones with the same identity. This is a challenging task, as the images to be matched are generally semantically misaligned due to the diversity of human poses and capture viewpoints, incompleteness of the visible bodies (due to occlusion), etc. In this paper, we propose a framework that drives the reID network to learn semantics-aligned feature representation through delicate supervision designs. Specifically, we build a Semantics Aligning Network (SAN) which consists of a base network as encoder (SA-Enc) for re-ID, and a decoder (SA-Dec) for reconstructing/regressing the densely semantics aligned full texture image. We jointly train the SAN under the supervisions of person re-identification and aligned texture generation. Moreover, at the decoder, besides the reconstruction loss, we add Triplet ReID constraints over the feature maps as the perceptual losses. The decoder is discarded in the inference and thus our scheme is computationally efficient. Ablation studies demonstrate the effectiveness of our design. We achieve the state-of-the-art performances on the benchmark datasets CUHK03, Market1501, MSMT17, and the partial person reID dataset Partial REID. Code for our proposed method is available at: https://github.com/microsoft/Semantics-Aligned-Representation-Learning-for-Person-Re-identification.Comment: Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20), code has been release

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

An investigation into automatic people counting and person re-identification

Author: Xu Bolei
Publication venue
Publication date
Field of study

We study two video surveillance problems in this thesis including people counting and person re-identification. To address the problem of people counting, we first propose a method called Random Projection Forest to utilise rich hand-crafted features. To achieve computational efficiency and scalability, we use random forest as the regression model whose tree structure is intrinsically fast and scalable. Unlike traditional approaches to random forest construction, we embed random projection in the tree nodes to simultaneously combat the curse of dimensionality and to introduce randomness in the tree construction thus making our new method very efficient and effective. We have also developed a deep learning model for people counting. We propose a multi-task deep learning model to simultaneously predict people number and the level of crowd density, which makes our method invariant to the image scale. To deal with problem of insufficient size of training dataset, we propose an "ambiguous labelling" strategy to create various labels for the training images. In a series of experiment, we show that creating ``ambiguous label" is a simple but effective method to improve not only the deep learning model but also the Random Projection Forest model based on hand-crafted features. For the problem of person re-identification, we have developed a novel deep learning framework called Deep Augmented Attribute Network (DAAN) to learn augmented attribute features for person re-identification. We first manually label two large datasets with pre-defined mid-level semantic attributes. We then construct a deep neural network with two output branches. The first branch predicts the attributes of the input image, while the second branch generates complement features that are fused with the output of the first branch to form the augmented attributes of the input image. We optimize the attribute branch with multiple-label classification loss and apply a ’Siamese’ network structure to ensure that the augmented attributes of images from the same person are close to each other whilst those from different persons are far apart. The final learned augmented attribute features are then used for person re-identification based on Euclidean distance. As manually labelling images is a time-consuming process, we have also extended our method to datasets with only person ID information but without attribute labels. We have conducted comprehensive experiments and results show that our method outperforms state-of-the-art methods. As labelling identity and attribute for person image is time consuming, we thus propose an unsupervised method to solve person re-identification and apply it to a more challenging problem called partial person re-identification. We first use an established image segmentation method to generate superpixels to construct an Attributed Region Adjacency Graph (ARAG) in which nodes corresponding with superpixels and edges representing correlations between superpixels. We then apply region-based Normalized Cut to the graph to merge similar neighbouring superpixels in order to form natural image regions corresponding to various body parts and backgrounds. To extract feature from segmented patches, we apply a Denoising Autoencoder to learn discriminative representation of image patches in each node of the graph. Finally, the similarity of an image pair is measured by the Earth Mover's Distance (EMD) between the robust image signatures of the nodes in the corresponding ARAGs

Nottingham ePrints

영상 기반 동일인 판별을 위한 부분 정합 학습

Author: 서유민
Publication venue: 서울대학교 대학원
Publication date: 01/02/2019
Field of study

학위논문 (박사)-- 서울대학교 대학원 : 공과대학 전기·정보공학부, 2019. 2. 이경무.Person re-identification is a problem of identifying the same individuals among the persons captured from different cameras. It is a challenging problem because the same person captured from non-overlapping cameras usually shows dramatic appearance change due to the viewpoint, pose, and illumination changes. Since it is an essential tool for many surveillance applications, various research directions have been exploredhowever, it is far from being solved. The goal of this thesis is to solve person re-identification problem under the surveillance system. In particular, we focus on two critical components: designing 1) a better image representation model using human poses and 2) a better training method using hard sample mining. First, we propose a part-aligned representation model which represents an image as the bilinear pooling between appearance and part maps. Since the image similarity is independently calculated from the locations of body parts, it addresses the body part misalignment issue and effectively distinguishes different people by discriminating fine-grained local differences. Second, we propose a stochastic hard sample mining method that exploits class information to generate diverse and hard examples to use for training. It efficiently explores the training samples while avoiding stuck in a small subset of hard samples, thereby effectively training the model. Finally, we propose an integrated system that combines the two approaches, which is benefited from both components. Experimental results show that the proposed method works robustly on five datasets with diverse conditions and its potential extension to the more general conditions.동일인 판별문제는 다른 카메라로 촬영된 각각의 영상에 찍힌 두 사람이 같은 사람인지 여부를 판단하는 문제이다. 이는 감시카메라와 보안에 관련된 다양한 응용 분야에서 중요한 도구로 활용되기 때문에 최근까지 많은 연구가 이루어지고 있다. 그러나 같은 사람이더라도 시간, 장소, 촬영 각도, 조명 상태가 다른 환경에서 찍히면 영상마다 보이는 모습이 달라지므로 판별을 자동화하기 어렵다는 문제가 있다. 본 논문에서는 주로 감시카메라 영상에 대해서, 각 영상에서 자동으로 사람을 검출한 후에 검출한 결과들이 서로 같은 사람인지 여부를 판단하는 문제를 풀고자 한다. 이를 위해 1) 어떤 모델이 영상을 잘 표현할것인지 2) 주어진 모델을 어떻게 잘 학습시킬수 있을지 두 가지 질문에 대해서 연구한다. 먼저 벡터 공간 상에서의 거리가 이미지 상에서 대응되는 파트들 사이의 생김새 차이의 합과 같아지도록 하는 매핑 함수를 설계함으로써 검출된 사람들 사이에 신체 부분별로 생김새를 비교를 통해 효과적인 판별을 가능하게 하는 모델을 제안한다. 두번째로 학습 과정에서 클래스 정보를 활용해서 적은 계산량으로 어려운 예시를 많이 보도록 함으로써 효과적으로 함수의 파라미터를 학습하는 방법을 제안한다. 최종적으로는 두 요소를 결합해서 새로운 동일인 판별 시스템을 제안하고자 한다. 본 논문에서는 실험결과를 통해 제안하는 방법이 다양한 환경에서 강인하고 효과적으로 동작함을 증명하였고 보다 일반적인 환경으로의 확장 가능성도 확인 할 수 있을 것이다.Abstract i Contents ii List of Tables v List of Figures vii 1. Introduction 1 1.1 Part-Aligned Bilinear Representations . . . . . . . . . . . . . . . . . 3 1.2 Stochastic Class-Based Hard Sample Mining . . . . . . . . . . . . . 4 1.3 Integrated System for Person Re-identification . . . . . . . . . . . . . 5 2. Part-Aligned Bilinear Represenatations 6 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3 Our Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3.1 Two-Stream Network . . . . . . . . . . . . . . . . . . . . . . 10 2.3.2 Bilinear Pooling . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3.3 Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.4 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.4.1 Part-Aware Image Similarity . . . . . . . . . . . . . . . . . . 13 2.4.2 Relationship to the Baseline Models . . . . . . . . . . . . . . 15 2.4.3 Decomposition of Appearance and Part Maps . . . . . . . . . 15 2.4.4 Part-Alignment Effects on Reducing Misalignment Issue . . . 19 2.5 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.6 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.6.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.6.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . 23 2.6.3 Comparison with the Baselines . . . . . . . . . . . . . . . . . 24 2.6.4 Comparison with State-of-the-Art Methods . . . . . . . . . . 25 2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3. Stochastic Class-Based Hard Sample Mining 35 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.3 Deep Metric Learning with Triplet Loss . . . . . . . . . . . . . . . . 40 3.3.1 Triplet Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.3.2 Efficient Learning with Triplet Loss . . . . . . . . . . . . . . 41 3.4 Batch Construction for Metric Learning . . . . . . . . . . . . . . . . 42 3.4.1 Neighbor Class Mining by Class Signatures . . . . . . . . . . 42 3.4.2 Batch Construction . . . . . . . . . . . . . . . . . . . . . . . 44 3.4.3 Scalable Extension to the Number of Classes . . . . . . . . . 50 3.5 Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.6 Feature Extractor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.7 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.7.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.7.2 Implementation Details . . . . . . . . . . . . . . . . . . . . . 55 3.7.3 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . 56 3.7.4 Effect of the Stochastic Hard Example Mining . . . . . . . . 59 3.7.5 Comparison with the Existing Methods on Image Retrieval Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .70 4. Integrated System for Person Re-identification 71 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.2 Hard Positive Mining . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.3 Integrated System for Person Re-identification . . . . . . . . . . . . . 75 4.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.4.1 Comparison with the baselines . . . . . . . . . . . . . . . . . 75 4.4.2 Comparison with the existing works . . . . . . . . . . . . . . 80 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.Conclusion 83 5.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Abstract (In Korean) 94Docto

SNU Open Repository and Archive

Deep learning with very few and no labels

Author: Li Yang
Publication venue: University of Missouri--Columbia
Publication date
Field of study

Deep neural networks have achieved remarkable performance in many computer vision applications such as image classification, object detection, instance segmentation, image retrieval, and person re-identification. However, to achieve the desired performance, deep neural networks often need a tremendously large set of labeled training samples to learn its huge network model. Labeling a large dataset is labor-intensive, time-consuming, and sometimes requiring expert knowledge. In this research, we study the following important question: how to train deep neural networks with very few or even no labeled samples? This leads to our research tasks in the following two major areas: semi-supervised and unsupervised learning. Specifically, for semi-supervised learning, we developed two major approaches. The first one is the Snowball approach which learns a deep neural network from very few samples based on iterative model evolution and confident sample discovery. The second one is the learned model composition approach which composes more efficient master networks from student models of past iterations through a network learning process. Critical sample discovery is developed to discover new critical unlabeled samples near the model decision boundary and provide the master model with lookahead access to these samples to enhance its guidance capability. For unsupervised learning, we have explored two major ideas. The first idea is transformed attention consistency where the network is learned based on selfsupervision information across images instead of within one single image. The second one is spatial assembly networks for image representation learning. We introduce a new learnable module, called spatial assembly network (SAN), which performs a learned re-organization and assembly of feature points and improves the network capabilities in handling spatial variations and structural changes of the image scene. Our experimental results on benchmark datasets demonstrate that our proposed methods have significantly improved the state-of-the-art in semi-supervised and unsupervised learning, outperforming existing methods by large margins.Includes bibliographical references

University of Missouri: MOspace

Person Re-identification by Local Maximal Occurrence Representation and Metric Learning

Author: Hu Yang
Li Stan Z.
Liao Shengcai
Zhu Xiangyu
Publication venue
Publication date: 06/05/2015
Field of study

Person re-identification is an important technique towards automatic search of a person's presence in a surveillance video. Two fundamental problems are critical for person re-identification, feature representation and metric learning. An effective feature representation should be robust to illumination and viewpoint changes, and a discriminant metric should be learned to match various person images. In this paper, we propose an effective feature representation called Local Maximal Occurrence (LOMO), and a subspace and metric learning method called Cross-view Quadratic Discriminant Analysis (XQDA). The LOMO feature analyzes the horizontal occurrence of local features, and maximizes the occurrence to make a stable representation against viewpoint changes. Besides, to handle illumination variations, we apply the Retinex transform and a scale invariant texture operator. To learn a discriminant metric, we propose to learn a discriminant low dimensional subspace by cross-view quadratic discriminant analysis, and simultaneously, a QDA metric is learned on the derived subspace. We also present a practical computation method for XQDA, as well as its regularization. Experiments on four challenging person re-identification databases, VIPeR, QMUL GRID, CUHK Campus, and CUHK03, show that the proposed method improves the state-of-the-art rank-1 identification rates by 2.2%, 4.88%, 28.91%, and 31.55% on the four databases, respectively.Comment: This paper has been accepted by CVPR 2015. For source codes and extracted features please visit http://www.cbsr.ia.ac.cn/users/scliao/projects/lomo_xqda

arXiv.org e-Print Archive

CiteSeerX

Crossref

Person re-identification via efficient inference in fully connected CRF

Author: Abir Das
Chen K. W.
Fei Xiong
Nikos Komodakis MRF
Nilski A.
Per-Erik Forssén
Philipp Krahenbuhl
Zeeshan Hayder
Publication venue
Publication date: 23/06/2015
Field of study

In this paper, we address the problem of person re-identification problem, i.e., retrieving instances from gallery which are generated by the same person as the given probe image. This is very challenging because the person's appearance usually undergoes significant variations due to changes in illumination, camera angle and view, background clutter, and occlusion over the camera network. In this paper, we assume that the matched gallery images should not only be similar to the probe, but also be similar to each other, under suitable metric. We express this assumption with a fully connected CRF model in which each node corresponds to a gallery and every pair of nodes are connected by an edge. A label variable is associated with each node to indicate whether the corresponding image is from target person. We define unary potential for each node using existing feature calculation and matching techniques, which reflect the similarity between probe and gallery image, and define pairwise potential for each edge in terms of a weighed combination of Gaussian kernels, which encode appearance similarity between pair of gallery images. The specific form of pairwise potential allows us to exploit an efficient inference algorithm to calculate the marginal distribution of each label variable for this dense connected CRF. We show the superiority of our method by applying it to public datasets and comparing with the state of the art.Comment: 7 pages, 4 figure

arXiv.org e-Print Archive

Crossref