Is aesthetic impact different from beauty? Is visual salience a reflection of
its capacity for effective communication? We present Impressions, a novel
dataset through which to investigate the semiotics of images, and how specific
visual features and design choices can elicit specific emotions, thoughts and
beliefs. We posit that the impactfulness of an image extends beyond formal
definitions of aesthetics, to its success as a communicative act, where style
contributes as much to meaning formation as the subject matter. However, prior
image captioning datasets are not designed to empower state-of-the-art
architectures to model potential human impressions or interpretations of
images. To fill this gap, we design an annotation task heavily inspired by
image analysis techniques in the Visual Arts to collect 1,440 image-caption
pairs and 4,320 unique annotations exploring impact, pragmatic image
description, impressions, and aesthetic design choices. We show that existing
multimodal image captioning and conditional generation models struggle to
simulate plausible human responses to images. However, this dataset
significantly improves their ability to model impressions and aesthetic
evaluations of images through fine-tuning and few-shot adaptation.Comment: To be published in EMNLP 202