Search CORE

25 research outputs found

Clip-level feature aggregation : a key factor for video-based person re-identification

Author: A Das
D Gray
H Liu
L Zheng
M Dimitrievski
RR Varior
S Karaman
T Wang
T Wang
W Zhang
X Su
Y Sun
Y Yan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

In the task of video-based person re-identification, features of persons in the query and gallery sets are compared to search the best match. Generally, most existing methods aggregate the frame-level features together using a temporal method to generate the clip-level fea- tures, instead of the sequence-level representations. In this paper, we propose a new method that aggregates the clip-level features to obtain the sequence-level representations of persons, which consists of two parts, i.e., Average Aggregation Strategy (AAS) and Raw Feature Utilization (RFU). AAS makes use of all frames in a video sequence to generate a better representation of a person, while RFU investigates how batch normalization operation influences feature representations in person re- identification. The experimental results demonstrate that our method can boost the performance of existing models for better accuracy. In particular, we achieve 87.7% rank-1 and 82.3% mAP on MARS dataset without any post-processing procedure, which outperforms the existing state-of-the-art

Crossref

Ghent University Academic Bibliography

Person Re-identification in Videos by Analyzing Spatio-temporal Tubes

Author: Chae Seungho
Choi Heeseung
Dogra Debi Prasad
Kim Ig-Jae
Sekh Arif Ahmed
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Typical person re-identification frameworks search for k best matches in a gallery of images that are often collected in varying conditions. The gallery usually contains image sequences for video re-identification applications. However, such a process is time consuming as video re-identification involves carrying out the matching process multiple times. In this paper, we propose a new method that extracts spatio-temporal frame sequences or tubes of moving persons and performs the re-identification in quick time. Initially, we apply a binary classifier to remove noisy images from the input query tube. In the next step, we use a key-pose detection-based query minimization technique. Finally, a hierarchical re-identification framework is proposed and used to rank the output tubes. Experiments with publicly available video re-identification datasets reveal that our framework is better than existing methods. It ranks the tubes with an average increase in the CMC accuracy of 6-8% across multiple datasets. Also, our method significantly reduces the number of false positives. A new video re-identification dataset, named Tube-based Re-identification Video Dataset (TRiViD), has been prepared with an aim to help the re-identification research community

Munin - Open Research Archive

NORA - Norwegian Open Research Archives

Rethinking Temporal Fusion for Video-based Person Re-identification on Semantic and Time Aspect

Author: Gong Yifei
Guo Xiaowei
Huang Feiyue
Jiang Xinyang
Sun Xing
Yang Qize
Zheng Feng
Zheng Weishi
Publication venue
Publication date: 27/11/2019
Field of study

Recently, the research interest of person re-identification (ReID) has gradually turned to video-based methods, which acquire a person representation by aggregating frame features of an entire video. However, existing video-based ReID methods do not consider the semantic difference brought by the outputs of different network stages, which potentially compromises the information richness of the person features. Furthermore, traditional methods ignore important relationship among frames, which causes information redundancy in fusion along the time axis. To address these issues, we propose a novel general temporal fusion framework to aggregate frame features on both semantic aspect and time aspect. As for the semantic aspect, a multi-stage fusion network is explored to fuse richer frame features at multiple semantic levels, which can effectively reduce the information loss caused by the traditional single-stage fusion. While, for the time axis, the existing intra-frame attention method is improved by adding a novel inter-frame attention module, which effectively reduces the information redundancy in temporal fusion by taking the relationship among frames into consideration. The experimental results show that our approach can effectively improve the video-based re-identification accuracy, achieving the state-of-the-art performance

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications