Occluded person re-identification (ReID) is a very challenging task due to
the occlusion disturbance and incomplete target information. Leveraging
external cues such as human pose or parsing to locate and align part features
has been proven to be very effective in occluded person ReID. Meanwhile, recent
Transformer structures have a strong ability of long-range modeling.
Considering the above facts, we propose a Teacher-Student Decoder (TSD)
framework for occluded person ReID, which utilizes the Transformer decoder with
the help of human parsing. More specifically, our proposed TSD consists of a
Parsing-aware Teacher Decoder (PTD) and a Standard Student Decoder (SSD). PTD
employs human parsing cues to restrict Transformer's attention and imparts this
information to SSD through feature distillation. Thereby, SSD can learn from
PTD to aggregate information of body parts automatically. Moreover, a mask
generator is designed to provide discriminative regions for better ReID. In
addition, existing occluded person ReID benchmarks utilize occluded samples as
queries, which will amplify the role of alleviating occlusion interference and
underestimate the impact of the feature absence issue. Contrastively, we
propose a new benchmark with non-occluded queries, serving as a complement to
the existing benchmark. Extensive experiments demonstrate that our proposed
method is superior and the new benchmark is essential. The source codes are
available at https://github.com/hh23333/TSD.Comment: Accepted by ICASSP202