Human intelligence can retrieve any person according to both visual and
language descriptions. However, the current computer vision community studies
specific person re-identification (ReID) tasks in different scenarios
separately, which limits the applications in the real world. This paper strives
to resolve this problem by proposing a new instruct-ReID task that requires the
model to retrieve images according to the given image or language
instructions.Our instruct-ReID is a more general ReID setting, where existing
ReID tasks can be viewed as special cases by designing different instructions.
We propose a large-scale OmniReID benchmark and an adaptive triplet loss as a
baseline method to facilitate research in this new setting. Experimental
results show that the baseline model trained on our OmniReID benchmark can
improve +0.5%, +3.3% mAP on Market1501 and CUHK03 for traditional ReID, +2.1%,
+0.2%, +15.3% mAP on PRCC, VC-Clothes, LTCC for clothes-changing ReID, +12.5%
mAP on COCAS+ real2 for clothestemplate based clothes-changing ReID when using
only RGB images, +25.5% mAP on COCAS+ real2 for our newly defined
language-instructed ReID. The dataset, model, and code will be available at
https://github.com/hwz-zju/Instruct-ReID