Stereoscopic image quality assessment (SIQA) plays a crucial role in
evaluating and improving the visual experience of 3D content. Existing
binocular properties and attention-based methods for SIQA have achieved
promising performance. However, these bottom-up approaches are inadequate in
exploiting the inherent characteristics of the human visual system (HVS). This
paper presents a novel network for SIQA via stereo attention, employing a
top-down perspective to guide the quality assessment process. Our proposed
method realizes the guidance from high-level binocular signals down to
low-level monocular signals, while the binocular and monocular information can
be calibrated progressively throughout the processing pipeline. We design a
generalized Stereo AttenTion (SAT) block to implement the top-down philosophy
in stereo perception. This block utilizes the fusion-generated attention map as
a high-level binocular modulator, influencing the representation of two
low-level monocular features. Additionally, we introduce an Energy Coefficient
(EC) to account for recent findings indicating that binocular responses in the
primate primary visual cortex are less than the sum of monocular responses. The
adaptive EC can tune the magnitude of binocular response flexibly, thus
enhancing the formation of robust binocular features within our framework. To
extract the most discriminative quality information from the summation and
subtraction of the two branches of monocular features, we utilize a
dual-pooling strategy that applies min-pooling and max-pooling operations to
the respective branches. Experimental results highlight the superiority of our
top-down method in simulating the property of visual perception and advancing
the state-of-the-art in the SIQA field. The code of this work is available at
https://github.com/Fanning-Zhang/SATNet.Comment: 13 pages, 4 figure