Evaluating Recommender Systems Qualitatively: A survey and Comparative Analysis

Abstract

Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Business AnalyticsRecommender systems have improved users' online quality of life by helping them find interesting and valuable items within a large item set. Most recommender system validation research has focused on accuracy metrics, studying the differences between the predicted and actual user ratings. However, recent research has found accuracy to underperform when systems go live, mainly due to accuracy’s inability to validate recommendation lists as a single entity, and shifted to evaluating recommender systems using "beyond-accuracy" metrics, like novelty and diversity. In this dissertation, we summarize and organize the leading research regarding the definitions and objectives of the beyond-accuracy metrics. Such metrics include coverage, diversity, novelty, serendipity, unexpectedness, utility, and fairness. The behaviors and relationships of these metrics are analyzed using four different models, two concerning the items characteristics (item-based) and two regarding the user behaviors (user-based). Furthermore, a new metric is proposed that allows the comparison of different models considering their overall beyond-accuracy performance. Using this metric, a reraking approach is designed to improve the performance of a system, aiming to achieve better recommendations. The impact of the reranking technique on each metric and algorithm is studied, and the accuracy and non-accuracy performance of each system is compared. We realized that, although the reranking technique can increase most beyond-accuracy metrics, the accuracy of that system starts to worsen due to the negative correlation between these two dimensions. We also found that item-based models tend to achieve much lower values of coverage and diversity than userbased models

    Similar works