1,098 research outputs found
Recommender systems fairness evaluation via generalized cross entropy
Fairness in recommender systems has been considered with respect
to sensitive attributes of users (e.g., gender, race) or items (e.g., revenue
in a multistakeholder setting). Regardless, the concept has been
commonly interpreted as some form of equality – i.e., the degree to
which the system is meeting the information needs of all its users in
an equal sense. In this paper, we argue that fairness in recommender
systems does not necessarily imply equality, but instead it should
consider a distribution of resources based on merits and needs.We
present a probabilistic framework based ongeneralized cross entropy
to evaluate fairness of recommender systems under this perspective,
wherewe showthat the proposed framework is flexible and explanatory
by allowing to incorporate domain knowledge (through an ideal
fair distribution) that can help to understand which item or user aspects
a recommendation algorithm is over- or under-representing.
Results on two real-world datasets show the merits of the proposed
evaluation framework both in terms of user and item fairnessThis work was supported in part by the Center for Intelligent Information
Retrieval and in part by project TIN2016-80630-P (MINECO
Workshop on Learning and Evaluating Recommendations with Impressions (LERI)
Recommender systems typically rely on past user interactions as the primary source of information for making predictions. However, although highly informative, past user interactions are strongly biased. Impressions, on the other hand, are a new source of information that indicate the items displayed on screen when the user interacted (or not) with them, and have the potential to impact the field of recommender systems in several ways. Early research on impressions was constrained by the limited availability of public datasets, but this is rapidly changing and, as a consequence, interest in impressions has increased. Impressions present new research questions and opportunities, but also bring new challenges. Several works propose to use impressions as part of recommender models in various ways and discuss their information content. Others explore their potential in off-policy-estimation and reinforcement learning. Overall, the interest of the community is growing, but efforts in this direction remain disconnected. Therefore, we believe that a workshop would be useful in bringing the community together
How to Perform Reproducible Experiments in the ELLIOT Recommendation Framework: Data Processing, Model Selection, and Performance Evaluation
Recommender Systems have shown to be an efective way to alleviate the over-choice problem and provide
accurate and tailored recommendations. However, the impressive number of proposed recommendation
algorithms, splitting strategies, evaluation protocols, metrics, and tasks, has made rigorous experimental
evaluation particularly challenging. ELLIOT is a comprehensive recommendation framework that aims
to run and reproduce an entire experimental pipeline by processing a simple confguration fle. The
framework loads, flters, and splits the data considering a vast set of strategies. Then, it optimizes
hyperparameters for several recommendation algorithms, selects the best models, compares them with
the baselines, computes metrics spanning from accuracy to beyond-accuracy, bias, and fairness, and
conducts statistical analysis. The aim is to provide researchers a tool to ease all the experimental
evaluation phases (and make them reproducible), from data reading to results collection. ELLIOT is
freely available on GitHub at https://github.com/sisinflab/ellio
Privacy-Aware Recommender Systems Challenge on Twitter's Home Timeline
Recommender systems constitute the core engine of most social network
platforms nowadays, aiming to maximize user satisfaction along with other key
business objectives. Twitter is no exception. Despite the fact that Twitter
data has been extensively used to understand socioeconomic and political
phenomena and user behaviour, the implicit feedback provided by users on Tweets
through their engagements on the Home Timeline has only been explored to a
limited extent. At the same time, there is a lack of large-scale public social
network datasets that would enable the scientific community to both benchmark
and build more powerful and comprehensive models that tailor content to user
interests. By releasing an original dataset of 160 million Tweets along with
engagement information, Twitter aims to address exactly that. During this
release, special attention is drawn on maintaining compliance with existing
privacy laws. Apart from user privacy, this paper touches on the key challenges
faced by researchers and professionals striving to predict user engagements. It
further describes the key aspects of the RecSys 2020 Challenge that was
organized by ACM RecSys in partnership with Twitter using this dataset.Comment: 16 pages, 2 table
Offline Evaluation of Reward-Optimizing Recommender Systems: The Case of Simulation
Both in academic and industry-based research, online evaluation methods are
seen as the golden standard for interactive applications like recommendation
systems. Naturally, the reason for this is that we can directly measure utility
metrics that rely on interventions, being the recommendations that are being
shown to users. Nevertheless, online evaluation methods are costly for a number
of reasons, and a clear need remains for reliable offline evaluation
procedures. In industry, offline metrics are often used as a first-line
evaluation to generate promising candidate models to evaluate online. In
academic work, limited access to online systems makes offline metrics the de
facto approach to validating novel methods. Two classes of offline metrics
exist: proxy-based methods, and counterfactual methods. The first class is
often poorly correlated with the online metrics we care about, and the latter
class only provides theoretical guarantees under assumptions that cannot be
fulfilled in real-world environments. Here, we make the case that
simulation-based comparisons provide ways forward beyond offline metrics, and
argue that they are a preferable means of evaluation.Comment: Accepted at the ACM RecSys 2021 Workshop on Simulation Methods for
Recommender System
Beyond Optimizing for Clicks: Incorporating Editorial Values in News Recommendation
With the uptake of algorithmic personalization in the news domain, news
organizations increasingly trust automated systems with previously considered
editorial responsibilities, e.g., prioritizing news to readers. In this paper
we study an automated news recommender system in the context of a news
organization's editorial values. We conduct and present two online studies with
a news recommender system, which span one and a half months and involve over
1,200 users. In our first study we explore how our news recommender steers
reading behavior in the context of editorial values such as serendipity,
dynamism, diversity, and coverage. Next, we present an intervention study where
we extend our news recommender to steer our readers to more dynamic reading
behavior. We find that (i) our recommender system yields more diverse reading
behavior and yields a higher coverage of articles compared to non-personalized
editorial rankings, and (ii) we can successfully incorporate dynamism in our
recommender system as a re-ranking method, effectively steering our readers to
more dynamic articles without hurting our recommender system's accuracy.Comment: To appear in UMAP 202
Improving accountability in recommender systems research through reproducibility
Reproducibility is a key requirement for scientific progress. It allows the reproduction of the works of others, and, as a consequence, to fully trust the reported claims and results. In this work, we argue that, by facilitating reproducibility of recommender systems experimentation, we indirectly address the issues of accountability and transparency in recommender systems research from the perspectives of practitioners, designers, and engineers aiming to assess the capabilities of published research works. These issues have become increasingly prevalent in recent literature. Reasons for this include societal movements around intelligent systems and artificial intelligence striving toward fair and objective use of human behavioral data (as in Machine Learning, Information Retrieval, or Human–Computer Interaction). Society has grown to expect explanations and transparency standards regarding the underlying algorithms making automated decisions for and around us. This work surveys existing definitions of these concepts and proposes a coherent terminology for recommender systems research, with the goal to connect reproducibility to accountability. We achieve this by introducing several guidelines and steps that lead to reproducible and, hence, accountable experimental workflows and research. We additionally analyze several instantiations of recommender system implementations available in the literature and discuss the extent to which they fit in the introduced framework. With this work, we aim to shed light on this important problem and facilitate progress in the field by increasing the accountability of researchThis work has been funded by the Ministerio de Ciencia, Innovación y Universidades (reference: PID2019-108965GB-I00
- …