Blind image quality assessment (BIQA) aims at automatically and accurately
forecasting objective scores for visual signals, which has been widely used to
monitor product and service quality in low-light applications, covering
smartphone photography, video surveillance, autonomous driving, etc. Recent
developments in this field are dominated by unimodal solutions inconsistent
with human subjective rating patterns, where human visual perception is
simultaneously reflected by multiple sensory information. In this article, we
present a unique blind multimodal quality assessment (BMQA) of low-light images
from subjective evaluation to objective score. To investigate the multimodal
mechanism, we first establish a multimodal low-light image quality (MLIQ)
database with authentic low-light distortions, containing image-text modality
pairs. Further, we specially design the key modules of BMQA, considering
multimodal quality representation, latent feature alignment and fusion, and
hybrid self-supervised and supervised learning. Extensive experiments show that
our BMQA yields state-of-the-art accuracy on the proposed MLIQ benchmark
database. In particular, we also build an independent single-image modality
Dark-4K database, which is used to verify its applicability and generalization
performance in mainstream unimodal applications. Qualitative and quantitative
results on Dark-4K show that BMQA achieves superior performance to existing
BIQA approaches as long as a pre-trained model is provided to generate text
description. The proposed framework and two databases as well as the collected
BIQA methods and evaluation metrics are made publicly available on here.Comment: 15 page