The recent person re-identification research has achieved great success by
learning from a large number of labeled person images. On the other hand, the
learned models often experience significant performance drops when applied to
images collected in a different environment. Unsupervised domain adaptation
(UDA) has been investigated to mitigate this constraint, but most existing
systems adapt images at pixel level only and ignore obvious discrepancies at
spatial level. This paper presents an innovative UDA-based person
re-identification network that is capable of adapting images at both spatial
and pixel levels simultaneously. A novel disentangled cycle-consistency loss is
designed which guides the learning of spatial-level and pixel-level adaptation
in a collaborative manner. In addition, a novel multi-modal mechanism is
incorporated which is capable of generating images of different geometry views
and augmenting training images effectively. Extensive experiments over a number
of public datasets show that the proposed UDA network achieves superior person
re-identification performance as compared with the state-of-the-art.Comment: Accepted to ICPR202