Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Business AnalyticsRecommender systems have improved users' online quality of life by helping them find interesting and
valuable items within a large item set. Most recommender system validation research has focused on
accuracy metrics, studying the differences between the predicted and actual user ratings. However,
recent research has found accuracy to underperform when systems go live, mainly due to accuracyβs
inability to validate recommendation lists as a single entity, and shifted to evaluating recommender
systems using "beyond-accuracy" metrics, like novelty and diversity.
In this dissertation, we summarize and organize the leading research regarding the definitions and
objectives of the beyond-accuracy metrics. Such metrics include coverage, diversity, novelty,
serendipity, unexpectedness, utility, and fairness. The behaviors and relationships of these metrics are
analyzed using four different models, two concerning the items characteristics (item-based) and two
regarding the user behaviors (user-based). Furthermore, a new metric is proposed that allows the
comparison of different models considering their overall beyond-accuracy performance. Using this
metric, a reraking approach is designed to improve the performance of a system, aiming to achieve
better recommendations. The impact of the reranking technique on each metric and algorithm is
studied, and the accuracy and non-accuracy performance of each system is compared. We realized
that, although the reranking technique can increase most beyond-accuracy metrics, the accuracy of
that system starts to worsen due to the negative correlation between these two dimensions. We also
found that item-based models tend to achieve much lower values of coverage and diversity than userbased models