1,180 research outputs found
Semantics-Aligned Representation Learning for Person Re-identification
Person re-identification (reID) aims to match person images to retrieve the
ones with the same identity. This is a challenging task, as the images to be
matched are generally semantically misaligned due to the diversity of human
poses and capture viewpoints, incompleteness of the visible bodies (due to
occlusion), etc. In this paper, we propose a framework that drives the reID
network to learn semantics-aligned feature representation through delicate
supervision designs. Specifically, we build a Semantics Aligning Network (SAN)
which consists of a base network as encoder (SA-Enc) for re-ID, and a decoder
(SA-Dec) for reconstructing/regressing the densely semantics aligned full
texture image. We jointly train the SAN under the supervisions of person
re-identification and aligned texture generation. Moreover, at the decoder,
besides the reconstruction loss, we add Triplet ReID constraints over the
feature maps as the perceptual losses. The decoder is discarded in the
inference and thus our scheme is computationally efficient. Ablation studies
demonstrate the effectiveness of our design. We achieve the state-of-the-art
performances on the benchmark datasets CUHK03, Market1501, MSMT17, and the
partial person reID dataset Partial REID. Code for our proposed method is
available at:
https://github.com/microsoft/Semantics-Aligned-Representation-Learning-for-Person-Re-identification.Comment: Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20),
code has been release
What-and-Where to Match: Deep Spatially Multiplicative Integration Networks for Person Re-identification
Matching pedestrians across disjoint camera views, known as person
re-identification (re-id), is a challenging problem that is of importance to
visual recognition and surveillance. Most existing methods exploit local
regions within spatial manipulation to perform matching in local
correspondence. However, they essentially extract \emph{fixed} representations
from pre-divided regions for each image and perform matching based on the
extracted representation subsequently. For models in this pipeline, local finer
patterns that are crucial to distinguish positive pairs from negative ones
cannot be captured, and thus making them underperformed. In this paper, we
propose a novel deep multiplicative integration gating function, which answers
the question of \emph{what-and-where to match} for effective person re-id. To
address \emph{what} to match, our deep network emphasizes common local patterns
by learning joint representations in a multiplicative way. The network
comprises two Convolutional Neural Networks (CNNs) to extract convolutional
activations, and generates relevant descriptors for pedestrian matching. This
thus, leads to flexible representations for pair-wise images. To address
\emph{where} to match, we combat the spatial misalignment by performing
spatially recurrent pooling via a four-directional recurrent neural network to
impose spatial dependency over all positions with respect to the entire image.
The proposed network is designed to be end-to-end trainable to characterize
local pairwise feature interactions in a spatially aligned manner. To
demonstrate the superiority of our method, extensive experiments are conducted
over three benchmark data sets: VIPeR, CUHK03 and Market-1501.Comment: Published at Pattern Recognition, Elsevie
- …