2 research outputs found
Video-based Person Re-identification Using Spatial-Temporal Attention Networks
We consider the problem of video-based person re-identification. The goal is
to identify a person from videos captured under different cameras. In this
paper, we propose an efficient spatial-temporal attention based model for
person re-identification from videos. Our method generates an attention score
for each frame based on frame-level features. The attention scores of all
frames in a video are used to produce a weighted feature vector for the input
video. Unlike most existing deep learning methods that use global
representation, our approach focuses on attention scores. Extensive experiments
on two benchmark datasets demonstrate that our method achieves the
state-of-the-art performance. This is a technical report
Ordered or Orderless: A Revisit for Video based Person Re-Identification
Is recurrent network really necessary for learning a good visual
representation for video based person re-identification (VPRe-id)? In this
paper, we first show that the common practice of employing recurrent neural
networks (RNNs) to aggregate temporal spatial features may not be optimal.
Specifically, with a diagnostic analysis, we show that the recurrent structure
may not be effective to learn temporal dependencies than what we expected and
implicitly yields an orderless representation. Based on this observation, we
then present a simple yet surprisingly powerful approach for VPRe-id, where we
treat VPRe-id as an efficient orderless ensemble of image based person
re-identification problem. More specifically, we divide videos into individual
images and re-identify person with ensemble of image based rankers. Under the
i.i.d. assumption, we provide an error bound that sheds light upon how could we
improve VPRe-id. Our work also presents a promising way to bridge the gap
between video and image based person re-identification. Comprehensive
experimental evaluations demonstrate that the proposed solution achieves
state-of-the-art performances on multiple widely used datasets (iLIDS-VID, PRID
2011, and MARS).Comment: Under Minor Revision in IEEE TPAM