Image aesthetic quality assessment (AQA) aims to assign numerical aesthetic
ratings to images whilst image aesthetic captioning (IAC) aims to generate
textual descriptions of the aesthetic aspects of images. In this paper, we
study image AQA and IAC together and present a new IAC method termed
Aesthetically Relevant Image Captioning (ARIC). Based on the observation that
most textual comments of an image are about objects and their interactions
rather than aspects of aesthetics, we first introduce the concept of Aesthetic
Relevance Score (ARS) of a sentence and have developed a model to automatically
label a sentence with its ARS. We then use the ARS to design the ARIC model
which includes an ARS weighted IAC loss function and an ARS based diverse
aesthetic caption selector (DACS). We present extensive experimental results to
show the soundness of the ARS concept and the effectiveness of the ARIC model
by demonstrating that texts with higher ARS's can predict the aesthetic ratings
more accurately and that the new ARIC model can generate more accurate,
aesthetically more relevant and more diverse image captions. Furthermore, a
large new research database containing 510K images with over 5 million comments
and 350K aesthetic scores, and code for implementing ARIC are available at
https://github.com/PengZai/ARIC.Comment: Aceepted by AAAI2023. Code and results available at
https://github.com/PengZai/ARI