Face attribute evaluation plays an important role in video surveillance and
face analysis. Although methods based on convolution neural networks have made
great progress, they inevitably only deal with one local neighborhood with
convolutions at a time. Besides, existing methods mostly regard face attribute
evaluation as the individual multi-label classification task, ignoring the
inherent relationship between semantic attributes and face identity
information. In this paper, we propose a novel \textbf{trans}former-based
representation for \textbf{f}ace \textbf{a}ttribute evaluation method
(\textbf{TransFA}), which could effectively enhance the attribute
discriminative representation learning in the context of attention mechanism.
The multiple branches transformer is employed to explore the inter-correlation
between different attributes in similar semantic regions for attribute feature
learning. Specially, the hierarchical identity-constraint attribute loss is
designed to train the end-to-end architecture, which could further integrate
face identity discriminative information to boost performance. Experimental
results on multiple face attribute benchmarks demonstrate that the proposed
TransFA achieves superior performances compared with state-of-the-art methods