60 research outputs found
Horizontal Pyramid Matching for Person Re-identification
Despite the remarkable recent progress, person re-identification (Re-ID)
approaches are still suffering from the failure cases where the discriminative
body parts are missing. To mitigate such cases, we propose a simple yet
effective Horizontal Pyramid Matching (HPM) approach to fully exploit various
partial information of a given person, so that correct person candidates can be
still identified even even some key parts are missing. Within the HPM, we make
the following contributions to produce a more robust feature representation for
the Re-ID task: 1) we learn to classify using partial feature representations
at different horizontal pyramid scales, which successfully enhance the
discriminative capabilities of various person parts; 2) we exploit average and
max pooling strategies to account for person-specific discriminative
information in a global-local manner. To validate the effectiveness of the
proposed HPM, extensive experiments are conducted on three popular benchmarks,
including Market-1501, DukeMTMC-ReID and CUHK03. In particular, we achieve mAP
scores of 83.1%, 74.5% and 59.7% on these benchmarks, which are the new
state-of-the-arts. Our code is available on GithubComment: Accepted by AAAI 201
STA: Spatial-Temporal Attention for Large-Scale Video-based Person Re-Identification
In this work, we propose a novel Spatial-Temporal Attention (STA) approach to
tackle the large-scale person re-identification task in videos. Different from
the most existing methods, which simply compute representations of video clips
using frame-level aggregation (e.g. average pooling), the proposed STA adopts a
more effective way for producing robust clip-level feature representation.
Concretely, our STA fully exploits those discriminative parts of one target
person in both spatial and temporal dimensions, which results in a 2-D
attention score matrix via inter-frame regularization to measure the
importances of spatial parts across different frames. Thus, a more robust
clip-level feature representation can be generated according to a weighted sum
operation guided by the mined 2-D attention score matrix. In this way, the
challenging cases for video-based person re-identification such as pose
variation and partial occlusion can be well tackled by the STA. We conduct
extensive experiments on two large-scale benchmarks, i.e. MARS and
DukeMTMC-VideoReID. In particular, the mAP reaches 87.7% on MARS, which
significantly outperforms the state-of-the-arts with a large margin of more
than 11.6%.Comment: Accepted as a conference paper at AAAI 201
Semantics-Aligned Representation Learning for Person Re-identification
Person re-identification (reID) aims to match person images to retrieve the
ones with the same identity. This is a challenging task, as the images to be
matched are generally semantically misaligned due to the diversity of human
poses and capture viewpoints, incompleteness of the visible bodies (due to
occlusion), etc. In this paper, we propose a framework that drives the reID
network to learn semantics-aligned feature representation through delicate
supervision designs. Specifically, we build a Semantics Aligning Network (SAN)
which consists of a base network as encoder (SA-Enc) for re-ID, and a decoder
(SA-Dec) for reconstructing/regressing the densely semantics aligned full
texture image. We jointly train the SAN under the supervisions of person
re-identification and aligned texture generation. Moreover, at the decoder,
besides the reconstruction loss, we add Triplet ReID constraints over the
feature maps as the perceptual losses. The decoder is discarded in the
inference and thus our scheme is computationally efficient. Ablation studies
demonstrate the effectiveness of our design. We achieve the state-of-the-art
performances on the benchmark datasets CUHK03, Market1501, MSMT17, and the
partial person reID dataset Partial REID. Code for our proposed method is
available at:
https://github.com/microsoft/Semantics-Aligned-Representation-Learning-for-Person-Re-identification.Comment: Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20),
code has been release
Real-time Person Re-identification at the Edge: A Mixed Precision Approach
A critical part of multi-person multi-camera tracking is person
re-identification (re-ID) algorithm, which recognizes and retains identities of
all detected unknown people throughout the video stream. Many re-ID algorithms
today exemplify state of the art results, but not much work has been done to
explore the deployment of such algorithms for computation and power constrained
real-time scenarios. In this paper, we study the effect of using a light-weight
model, MobileNet-v2 for re-ID and investigate the impact of single (FP32)
precision versus half (FP16) precision for training on the server and inference
on the edge nodes. We further compare the results with the baseline model which
uses ResNet-50 on state of the art benchmarks including CUHK03, Market-1501,
and Duke-MTMC. The MobileNet-V2 mixed precision training method can improve
both inference throughput on the edge node, and training time on server
reaching to 27.77fps and , respectively and decreases
power consumption on the edge node by , while it deteriorates
accuracy only 5.6\% in respect to ResNet-50 single precision on the average for
three different datasets. The code and pre-trained networks are publicly
available at https://github.com/TeCSAR-UNCC/person-reid.Comment: This is a pre-print of an article published in International
Conference on Image Analysis and Recognition (ICIAR 2019), Lecture Notes in
Computer Science. The final authenticated version is available online at
https://doi.org/10.1007/978-3-030-27272-2_
- …