53 research outputs found
Spatial-Temporal Person Re-identification
Most of current person re-identification (ReID) methods neglect a
spatial-temporal constraint. Given a query image, conventional methods compute
the feature distances between the query image and all the gallery images and
return a similarity ranked table. When the gallery database is very large in
practice, these approaches fail to obtain a good performance due to appearance
ambiguity across different camera views. In this paper, we propose a novel
two-stream spatial-temporal person ReID (st-ReID) framework that mines both
visual semantic information and spatial-temporal information. To this end, a
joint similarity metric with Logistic Smoothing (LS) is introduced to integrate
two kinds of heterogeneous information into a unified framework. To approximate
a complex spatial-temporal probability distribution, we develop a fast
Histogram-Parzen (HP) method. With the help of the spatial-temporal constraint,
the st-ReID model eliminates lots of irrelevant images and thus narrows the
gallery database. Without bells and whistles, our st-ReID method achieves
rank-1 accuracy of 98.1\% on Market-1501 and 94.4\% on DukeMTMC-reID, improving
from the baselines 91.2\% and 83.8\%, respectively, outperforming all previous
state-of-the-art methods by a large margin.Comment: AAAI 201
Multi-camera trajectory forecasting : pedestrian trajectory prediction in a network of cameras
We introduce the task of multi-camera trajectory forecasting (MCTF), where the future trajectory of an object is predicted in a network of cameras. Prior works consider forecasting trajectories in a single camera view. Our work is the first to consider the challenging scenario of forecasting across multiple non-overlapping camera views. This has wide applicability in tasks such as re-identification and multi-target multi-camera tracking. To facilitate research in this new area, we release the Warwick-NTU Multi-camera Forecasting Database (WNMF), a unique dataset of multi-camera pedestrian trajectories from a network of 15 synchronized cameras. To accurately label this large dataset (600 hours of video footage), we also develop a semi-automated annotation method. An effective MCTF model should proactively anticipate where and when a person will re-appear in the camera network. In this paper, we consider the task of predicting the next camera a pedestrian will re-appear after leaving the view of another camera, and present several baseline approaches for this. The labeled database is available online https://github.com/olly-styles/Multi-Camera-Trajectory-Forecastin
Interpretable and Generalizable Person Re-Identification with Query-Adaptive Convolution and Temporal Lifting
For person re-identification, existing deep networks often focus on
representation learning. However, without transfer learning, the learned model
is fixed as is, which is not adaptable for handling various unseen scenarios.
In this paper, beyond representation learning, we consider how to formulate
person image matching directly in deep feature maps. We treat image matching as
finding local correspondences in feature maps, and construct query-adaptive
convolution kernels on the fly to achieve local matching. In this way, the
matching process and results are interpretable, and this explicit matching is
more generalizable than representation features to unseen scenarios, such as
unknown misalignments, pose or viewpoint changes. To facilitate end-to-end
training of this architecture, we further build a class memory module to cache
feature maps of the most recent samples of each class, so as to compute image
matching losses for metric learning. Through direct cross-dataset evaluation,
the proposed Query-Adaptive Convolution (QAConv) method gains large
improvements over popular learning methods (about 10%+ mAP), and achieves
comparable results to many transfer learning methods. Besides, a model-free
temporal cooccurrence based score weighting method called TLift is proposed,
which improves the performance to a further extent, achieving state-of-the-art
results in cross-dataset person re-identification. Code is available at
https://github.com/ShengcaiLiao/QAConv.Comment: This is the ECCV 2020 version, including the appendi
- …