Visual Emotion Analysis (VEA) aims at predicting people's emotional responses
to visual stimuli. This is a promising, yet challenging, task in affective
computing, which has drawn increasing attention in recent years. Most of the
existing work in this area focuses on feature design, while little attention
has been paid to dataset construction. In this work, we introduce EmoSet, the
first large-scale visual emotion dataset annotated with rich attributes, which
is superior to existing datasets in four aspects: scale, annotation richness,
diversity, and data balance. EmoSet comprises 3.3 million images in total, with
118,102 of these images carefully labeled by human annotators, making it five
times larger than the largest existing dataset. EmoSet includes images from
social networks, as well as artistic images, and it is well balanced between
different emotion categories. Motivated by psychological studies, in addition
to emotion category, each image is also annotated with a set of describable
emotion attributes: brightness, colorfulness, scene type, object class, facial
expression, and human action, which can help understand visual emotions in a
precise and interpretable way. The relevance of these emotion attributes is
validated by analyzing the correlations between them and visual emotion, as
well as by designing an attribute module to help visual emotion recognition. We
believe EmoSet will bring some key insights and encourage further research in
visual emotion analysis and understanding. Project page:
https://vcc.tech/EmoSet.Comment: Accepted to ICCV2023, similar to the final versio