In this paper, we are interested in learning a generalizable person
re-identification (re-ID) representation from unlabeled videos. Compared with
1) the popular unsupervised re-ID setting where the training and test sets are
typically under the same domain, and 2) the popular domain generalization (DG)
re-ID setting where the training samples are labeled, our novel scenario
combines their key challenges: the training samples are unlabeled, and
collected form various domains which do no align with the test domain. In other
words, we aim to learn a representation in an unsupervised manner and directly
use the learned representation for re-ID in novel domains. To fulfill this
goal, we make two main contributions: First, we propose Cycle Association
(CycAs), a scalable self-supervised learning method for re-ID with low training
complexity; and second, we construct a large-scale unlabeled re-ID dataset
named LMP-video, tailored for the proposed method. Specifically, CycAs learns
re-ID features by enforcing cycle consistency of instance association between
temporally successive video frame pairs, and the training cost is merely linear
to the data size, making large-scale training possible. On the other hand, the
LMP-video dataset is extremely large, containing 50 million unlabeled person
images cropped from over 10K Youtube videos, therefore is sufficient to serve
as fertile soil for self-supervised learning. Trained on LMP-video, we show
that CycAs learns good generalization towards novel domains. The achieved
results sometimes even outperform supervised domain generalizable models.
Remarkably, CycAs achieves 82.2% Rank-1 on Market-1501 and 49.0% Rank-1 on
MSMT17 with zero human annotation, surpassing state-of-the-art supervised DG
re-ID methods. Moreover, we also demonstrate the superiority of CycAs under the
canonical unsupervised re-ID and the pretrain-and-finetune scenarios