Blind Image Quality Assessment (BIQA) is a fundamental task in computer
vision, which however remains unresolved due to the complex distortion
conditions and diversified image contents. To confront this challenge, we in
this paper propose a novel BIQA pipeline based on the Transformer architecture,
which achieves an efficient quality-aware feature representation with much
fewer data. More specifically, we consider the traditional fine-tuning in BIQA
as an interpretation of the pre-trained model. In this way, we further
introduce a Transformer decoder to refine the perceptual information of the CLS
token from different perspectives. This enables our model to establish the
quality-aware feature manifold efficiently while attaining a strong
generalization capability. Meanwhile, inspired by the subjective evaluation
behaviors of human, we introduce a novel attention panel mechanism, which
improves the model performance and reduces the prediction uncertainty
simultaneously. The proposed BIQA method maintains a lightweight design with
only one layer of the decoder, yet extensive experiments on eight standard BIQA
datasets (both synthetic and authentic) demonstrate its superior performance to
the state-of-the-art BIQA methods, i.e., achieving the SRCC values of 0.875
(vs. 0.859 in LIVEC) and 0.980 (vs. 0.969 in LIVE).Comment: Accepted by AAAI 202