347 research outputs found
Tracklet Self-Supervised Learning for Unsupervised Person Re-Identification
Existing unsupervised person re-identification (re-id) methods mainly focus on cross-domain adaptation or one-shot learning. Although they are more scalable than the supervised learning counterparts, relying on a relevant labelled source domain or one labelled tracklet per person initialisation still restricts their scalability in real-world deployments. To alleviate these problems, some recent studies develop unsupervised tracklet association and bottom-up image clustering methods, but they still rely on explicit camera annotation or merely utilise suboptimal global clustering. In this work, we formulate a novel tracklet self-supervised learning (TSSL) method, which is capable of capitalising directly from abundant unlabelled tracklet data, to optimise a feature embedding space for both video and image unsupervised re-id. This is achieved by designing a comprehensive unsupervised learning objective that accounts for tracklet frame coherence, tracklet neighbourhood compactness, and tracklet cluster structure in a unified formulation. As a pure unsupervised learning re-id model, TSSL is end-to-end trainable at the absence of source data annotation, person identity labels, and camera prior knowledge. Extensive experiments demonstrate the superiority of TSSL over a wide variety of the state-of-the-art alternative methods on four large-scale person re-id benchmarks, including Market-1501, DukeMTMC-ReID, MARS and DukeMTMC-VideoReID
Camera Alignment and Weighted Contrastive Learning for Domain Adaptation in Video Person ReID
Systems for person re-identification (ReID) can achieve a high accuracy when
trained on large fully-labeled image datasets. However, the domain shift
typically associated with diverse operational capture conditions (e.g., camera
viewpoints and lighting) may translate to a significant decline in performance.
This paper focuses on unsupervised domain adaptation (UDA) for video-based ReID
- a relevant scenario that is less explored in the literature. In this
scenario, the ReID model must adapt to a complex target domain defined by a
network of diverse video cameras based on tracklet information. State-of-art
methods cluster unlabeled target data, yet domain shifts across target cameras
(sub-domains) can lead to poor initialization of clustering methods that
propagates noise across epochs, thus preventing the ReID model to accurately
associate samples of same identity. In this paper, an UDA method is introduced
for video person ReID that leverages knowledge on video tracklets, and on the
distribution of frames captured over target cameras to improve the performance
of CNN backbones trained using pseudo-labels. Our method relies on an
adversarial approach, where a camera-discriminator network is introduced to
extract discriminant camera-independent representations, facilitating the
subsequent clustering. In addition, a weighted contrastive loss is proposed to
leverage the confidence of clusters, and mitigate the risk of incorrect
identity associations. Experimental results obtained on three challenging
video-based person ReID datasets - PRID2011, iLIDS-VID, and MARS - indicate
that our proposed method can outperform related state-of-the-art methods. Our
code is available at: \url{https://github.com/dmekhazni/CAWCL-ReID}Comment: IEEE/CVF Winter Conference on Applications of Computer Vision(WACV)
202
- …