Multimodal-aware recommender systems (MRSs) exploit multimodal content (e.g.,
product images or descriptions) as items' side information to improve
recommendation accuracy. While most of such methods rely on factorization
models (e.g., MFBPR) as base architecture, it has been shown that MFBPR may be
affected by popularity bias, meaning that it inherently tends to boost the
recommendation of popular (i.e., short-head) items at the detriment of niche
(i.e., long-tail) items from the catalog. Motivated by this assumption, in this
work, we provide one of the first analyses on how multimodality in
recommendation could further amplify popularity bias. Concretely, we evaluate
the performance of four state-of-the-art MRSs algorithms (i.e., VBPR, MMGCN,
GRCN, LATTICE) on three datasets from Amazon by assessing, along with
recommendation accuracy metrics, performance measures accounting for the
diversity of recommended items and the portion of retrieved niche items. To
better investigate this aspect, we decide to study the separate influence of
each modality (i.e., visual and textual) on popularity bias in different
evaluation dimensions. Results, which demonstrate how the single modality may
augment the negative effect of popularity bias, shed light on the importance to
provide a more rigorous analysis of the performance of such models