6 research outputs found
Diverse Knowledge Distillation for End-to-End Person Search
Person search aims to localize and identify a specific person from a gallery
of images. Recent methods can be categorized into two groups, i.e., two-step
and end-to-end approaches. The former views person search as two independent
tasks and achieves dominant results using separately trained person detection
and re-identification (Re-ID) models. The latter performs person search in an
end-to-end fashion. Although the end-to-end approaches yield higher inference
efficiency, they largely lag behind those two-step counterparts in terms of
accuracy. In this paper, we argue that the gap between the two kinds of methods
is mainly caused by the Re-ID sub-networks of end-to-end methods. To this end,
we propose a simple yet strong end-to-end network with diverse knowledge
distillation to break the bottleneck. We also design a spatial-invariant
augmentation to assist model to be invariant to inaccurate detection results.
Experimental results on the CUHK-SYSU and PRW datasets demonstrate the
superiority of our method against existing approaches -- it achieves on par
accuracy with state-of-the-art two-step methods while maintaining high
efficiency due to the single joint model. Code is available at:
https://git.io/DKD-PersonSearch.Comment: Accepted to AAAI, 2021. Code is available at:
https://git.io/DKD-PersonSearc
Person Identification With Convolutional Neural Networks
Person identification aims at matching persons across images or videos captured by different cameras, without requiring the presence of persons’ faces. It is an important problem in computer vision community and has many important real-world applica- tions, such as person search, security surveillance, and no-checkout stores. However, this problem is very challenging due to various factors, such as illumination varia- tion, view changes, human pose deformation, and occlusion. Traditional approaches generally focus on hand-crafting features and/or learning distance metrics for match- ing to tackle these challenges. With Convolutional Neural Networks (CNNs), feature extraction and metric learning can be combined in a unified framework.
In this work, we study two important sub-problems of person identification: cross- view person identification and visible-thermal person re-identification. Cross-view person identification aims to match persons from temporally synchronized videos taken by wearable cameras. Visible-thermal person re-identification aims to match persons between images taken by visible cameras under normal illumination condition and thermal cameras under poor illumination condition such as during night time.
For cross-view person identification, we focus on addressing the challenge of view changes between cameras. Since the videos are taken by wearable cameras, the un- derlying 3D motion pattern of the same person should be consistent and thus can be used for effective matching. In light of this, we propose to extract view-invariant mo- tion features to match persons. Specifically, we propose a CNN-based triplet network to learn view-invariant features by establishing correspondences between 3D human MoCap data and the projected 2D optical flow data. After training, the triplet net-work is used to extract view-invariant features from 2D optical flows of videos for matching persons. We collect three datasets for evaluation. The experimental results demonstrate the effectiveness of this method.
For visible-thermal person re-identification, we focus on the challenge of domain discrepancy between visible images and thermal images. We propose to address this issue at a class level with a CNN-based two-stream network. Specifically, our idea is to learn a center for features of each person in each domain (visible and thermal domains), using a new relaxed center loss. Instead of imposing constraints between pairs of samples, we enforce the centers of the same person in visible and thermal domains to be close, and the centers of different persons to be distant. We also enforce the feature vector from the center of one person to another in visible feature space to be similar to that in thermal feature space. Using this network, we can learn domain- independent features for visible-thermal person re-identification. Experiments on two public datasets demonstrate the effectiveness of this method