This paper targets the problem of image set-based face verification and
identification. Unlike traditional single media (an image or video) setting, we
encounter a set of heterogeneous contents containing orderless images and
videos. The importance of each image is usually considered either equal or
based on their independent quality assessment. How to model the relationship of
orderless images within a set remains a challenge. We address this problem by
formulating it as a Markov Decision Process (MDP) in the latent space.
Specifically, we first present a dependency-aware attention control (DAC)
network, which resorts to actor-critic reinforcement learning for sequential
attention decision of each image embedding to fully exploit the rich
correlation cues among the unordered images. Moreover, we introduce its
sample-efficient variant with off-policy experience replay to speed up the
learning process. The pose-guided representation scheme can further boost the
performance at the extremes of the pose variation.Comment: Fixed the unreadable code in CVF version. arXiv admin note: text
overlap with arXiv:1707.00130 by other author