Modern recommender systems model people and items by discovering or `teasing
apart' the underlying dimensions that encode the properties of items and users'
preferences toward them. Critically, such dimensions are uncovered based on
user feedback, often in implicit form (such as purchase histories, browsing
logs, etc.); in addition, some recommender systems make use of side
information, such as product attributes, temporal information, or review text.
However one important feature that is typically ignored by existing
personalized recommendation and ranking methods is the visual appearance of the
items being considered. In this paper we propose a scalable factorization model
to incorporate visual signals into predictors of people's opinions, which we
apply to a selection of large, real-world datasets. We make use of visual
features extracted from product images using (pre-trained) deep networks, on
top of which we learn an additional layer that uncovers the visual dimensions
that best explain the variation in people's feedback. This not only leads to
significantly more accurate personalized ranking methods, but also helps to
alleviate cold start issues, and qualitatively to analyze the visual dimensions
that influence people's opinions.Comment: AAAI'1