31 research outputs found
Offline and online evaluation of news recommender systems at swissinfo.ch
We report on the live evaluation of various news recom- mender systems conducted on the website swissinfo.ch. We demonstrate that there is a major diffierence between offine and online accuracy evaluations. In an offine setting, rec- ommending most popular stories is the best strategy, while in a live environment this strategy is the poorest. For online setting, context-tree recommender systems which profile the users in real-time improve the click-through rate by up to 35%. The visit length also increases by a factor of 2.5. Our experience holds important lessons for the evaluation of rec- ommender systems with offine data as well as for the use of the click-through rate as a performance indicator. Copyright © 2014 ACM
Bridging Offline-Online Evaluation with a Time-dependent and Popularity Bias-free Offline Metric for Recommenders
The evaluation of recommendation systems is a complex task. The offline and
online evaluation metrics for recommender systems are ambiguous in their true
objectives. The majority of recently published papers benchmark their methods
using ill-posed offline evaluation methodology that often fails to predict true
online performance. Because of this, the impact that academic research has on
the industry is reduced. The aim of our research is to investigate and compare
the online performance of offline evaluation metrics. We show that penalizing
popular items and considering the time of transactions during the evaluation
significantly improves our ability to choose the best recommendation model for
a live recommender system. Our results, averaged over five large-size
real-world live data procured from recommenders, aim to help the academic
community to understand better offline evaluation and optimization criteria
that are more relevant for real applications of recommender systems.Comment: Accepted to evalRS 2023@KD
ArXivDigest: A Living Lab for Personalized Scientific Literature Recommendation
Providing personalized recommendations that are also accompanied by
explanations as to why an item is recommended is a research area of growing
importance. At the same time, progress is limited by the availability of open
evaluation resources. In this work, we address the task of scientific
literature recommendation. We present arXivDigest, which is an online service
providing personalized arXiv recommendations to end users and operates as a
living lab for researchers wishing to work on explainable scientific literature
recommendations.Comment: Proceedings of the 29th ACM International Conference on Information
and Knowledge Management (CIKM'20), Oct 202
A Common Misassumption in Online Experiments with Machine Learning Models
Online experiments such as Randomised Controlled Trials (RCTs) or A/B-tests
are the bread and butter of modern platforms on the web. They are conducted
continuously to allow platforms to estimate the causal effect of replacing
system variant "A" with variant "B", on some metric of interest. These variants
can differ in many aspects. In this paper, we focus on the common use-case
where they correspond to machine learning models. The online experiment then
serves as the final arbiter to decide which model is superior, and should thus
be shipped.
The statistical literature on causal effect estimation from RCTs has a
substantial history, which contributes deservedly to the level of trust
researchers and practitioners have in this "gold standard" of evaluation
practices. Nevertheless, in the particular case of machine learning
experiments, we remark that certain critical issues remain. Specifically, the
assumptions that are required to ascertain that A/B-tests yield unbiased
estimates of the causal effect, are seldom met in practical applications. We
argue that, because variants typically learn using pooled data, a lack of model
interference cannot be guaranteed. This undermines the conclusions we can draw
from online experiments with machine learning models. We discuss the
implications this has for practitioners, and for the research literature
Beyond Optimizing for Clicks: Incorporating Editorial Values in News Recommendation
With the uptake of algorithmic personalization in the news domain, news
organizations increasingly trust automated systems with previously considered
editorial responsibilities, e.g., prioritizing news to readers. In this paper
we study an automated news recommender system in the context of a news
organization's editorial values. We conduct and present two online studies with
a news recommender system, which span one and a half months and involve over
1,200 users. In our first study we explore how our news recommender steers
reading behavior in the context of editorial values such as serendipity,
dynamism, diversity, and coverage. Next, we present an intervention study where
we extend our news recommender to steer our readers to more dynamic reading
behavior. We find that (i) our recommender system yields more diverse reading
behavior and yields a higher coverage of articles compared to non-personalized
editorial rankings, and (ii) we can successfully incorporate dynamism in our
recommender system as a re-ranking method, effectively steering our readers to
more dynamic articles without hurting our recommender system's accuracy.Comment: To appear in UMAP 202
Recommended from our members
Recommender Systems and Misinformation: The Problem or the Solution?
Recommender Systems have been pointed as one of the major culprits of misinformation spreading in the digital sphere. These systems have recently gone under heavy criticism for promoting the creation of filter bubbles, lowering the diversity of information users are exposed to and the social contacts they create. This influences the dynamics of social news sharing, and particularly the ways misinformation initiates and propagates. However, while Recommender Systems have been accused of fuelling the spread of misinformation, it is still unclear which particular types of recommender algorithms are more prone to recommend misinforming news, and if, and how, existing recommendation algorithms and evaluation metrics, can be modified or adapted to mitigate the misinformation spreading effect. In this position paper, we describe some of the key challenges behind assessing and measuring the effect of existing recommendation algorithms on the recommendation of misinforming articles and how such algorithms could be adapted, modified, and evaluated to counter this effect based on existing social science and psychology research
Overview of NewsREEL’16: Multi-dimensional evaluation of real-time stream-recommendation algorithms
Successful news recommendation requires facing the challenges of dynamic item sets, contextual item relevance, and of fulfilling non-functional requirements, such as response time. The CLEF NewsREEL challenge is a campaign-style evaluation lab allowing participants to tackle news recommendation and to optimize and evaluate their recommender algorithms both online and offline. In this paper, we summarize the objectives and challenges of NewsREEL 2016. We cover two contrasting perspectives on the challenge: that of the operator (the business providing recommendations) and that of the challenge participant (the researchers developing recommender algorithms). In the intersection of these perspectives, new insights can be gained on how to effectively evaluate real-time stream recommendation algorithms