Search CORE

31 research outputs found

Offline and online evaluation of news recommender systems at swissinfo.ch

Author: Alazzawi Ayar
Bruttin Christophe
Donatsch Olivier
Faltings Boi
Garcin Florent
Huber Amr
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

We report on the live evaluation of various news recom- mender systems conducted on the website swissinfo.ch. We demonstrate that there is a major diffierence between offine and online accuracy evaluations. In an offine setting, rec- ommending most popular stories is the best strategy, while in a live environment this strategy is the poorest. For online setting, context-tree recommender systems which profile the users in real-time improve the click-through rate by up to 35%. The visit length also increases by a factor of 2.5. Our experience holds important lessons for the evaluation of rec- ommender systems with offine data as well as for the use of the click-through rate as a performance indicator. Copyright © 2014 ACM

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Bridging Offline-Online Evaluation with a Time-dependent and Popularity Bias-free Offline Metric for Recommenders

Author: Alves Rodrigo
Kasalický Petr
Kordík Pavel
Publication venue
Publication date: 13/08/2023
Field of study

The evaluation of recommendation systems is a complex task. The offline and online evaluation metrics for recommender systems are ambiguous in their true objectives. The majority of recently published papers benchmark their methods using ill-posed offline evaluation methodology that often fails to predict true online performance. Because of this, the impact that academic research has on the industry is reduced. The aim of our research is to investigate and compare the online performance of offline evaluation metrics. We show that penalizing popular items and considering the time of transactions during the evaluation significantly improves our ability to choose the best recommendation model for a live recommender system. Our results, averaged over five large-size real-world live data procured from recommenders, aim to help the academic community to understand better offline evaluation and optimization criteria that are more relevant for real applications of recommender systems.Comment: Accepted to evalRS 2023@KD

arXiv.org e-Print Archive

ArXivDigest: A Living Lab for Personalized Scientific Literature Recommendation

Author: Balog Krisztian
Gingstad Kristian
Jekteberg Øyvind
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 24/09/2020
Field of study

Providing personalized recommendations that are also accompanied by explanations as to why an item is recommended is a research area of growing importance. At the same time, progress is limited by the availability of open evaluation resources. In this work, we address the task of scientific literature recommendation. We present arXivDigest, which is an online service providing personalized arXiv recommendations to end users and operates as a living lab for researchers wishing to work on explainable scientific literature recommendations.Comment: Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM'20), Oct 202

arXiv.org e-Print Archive

Crossref

A Common Misassumption in Online Experiments with Machine Learning Models

Author: Jeunen Olivier
Publication venue
Publication date: 21/04/2023
Field of study

Online experiments such as Randomised Controlled Trials (RCTs) or A/B-tests are the bread and butter of modern platforms on the web. They are conducted continuously to allow platforms to estimate the causal effect of replacing system variant "A" with variant "B", on some metric of interest. These variants can differ in many aspects. In this paper, we focus on the common use-case where they correspond to machine learning models. The online experiment then serves as the final arbiter to decide which model is superior, and should thus be shipped. The statistical literature on causal effect estimation from RCTs has a substantial history, which contributes deservedly to the level of trust researchers and practitioners have in this "gold standard" of evaluation practices. Nevertheless, in the particular case of machine learning experiments, we remark that certain critical issues remain. Specifically, the assumptions that are required to ascertain that A/B-tests yield unbiased estimates of the causal effect, are seldom met in practical applications. We argue that, because variants typically learn using pooled data, a lack of model interference cannot be guaranteed. This undermines the conclusions we can draw from online experiments with machine learning models. We discuss the implications this has for practitioners, and for the research literature

arXiv.org e-Print Archive

Beyond Optimizing for Clicks: Incorporating Editorial Values in News Recommendation

Author: Bastian Mariella
Bradley Keith
Lu Feng
Oard Douglas
Ricci Francesco
Sappelli Maya
Shani Guy
van Drunen M Z
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 21/04/2020
Field of study

With the uptake of algorithmic personalization in the news domain, news organizations increasingly trust automated systems with previously considered editorial responsibilities, e.g., prioritizing news to readers. In this paper we study an automated news recommender system in the context of a news organization's editorial values. We conduct and present two online studies with a news recommender system, which span one and a half months and involve over 1,200 users. In our first study we explore how our news recommender steers reading behavior in the context of editorial values such as serendipity, dynamism, diversity, and coverage. Next, we present an intervention study where we extend our news recommender to steer our readers to more dynamic reading behavior. We find that (i) our recommender system yields more diverse reading behavior and yields a higher coverage of articles compared to non-personalized editorial rankings, and (ii) we can successfully incorporate dynamism in our recommender system as a re-ranking method, effectively steering our readers to more dynamic articles without hurting our recommender system's accuracy.Comment: To appear in UMAP 202

arXiv.org e-Print Archive

Crossref

Recommended from our members

Recommender Systems and Misinformation: The Problem or the Solution?

Author: Bellogin Alejandro
Fernandez Miriam
Publication venue
Publication date: 01/01/2020
Field of study

Recommender Systems have been pointed as one of the major culprits of misinformation spreading in the digital sphere. These systems have recently gone under heavy criticism for promoting the creation of filter bubbles, lowering the diversity of information users are exposed to and the social contacts they create. This influences the dynamics of social news sharing, and particularly the ways misinformation initiates and propagates. However, while Recommender Systems have been accused of fuelling the spread of misinformation, it is still unclear which particular types of recommender algorithms are more prone to recommend misinforming news, and if, and how, existing recommendation algorithms and evaluation metrics, can be modified or adapted to mitigate the misinformation spreading effect. In this position paper, we describe some of the key challenges behind assessing and measuring the effect of existing recommendation algorithms on the recommendation of misinforming articles and how such algorithms could be adapted, modified, and evaluated to counter this effect based on existing social science and psychology research

Open Research Online (The Open University)

Overview of NewsREEL’16: Multi-dimensional evaluation of real-time stream-recommendation algorithms

Author: Brodt T.
Fuhr N.
Gebremeskel G.G.
Gonçalves T.
Hopfgartner F.
Kille B.
Larson M.A.
Lommatzsch A.
Malagoli D.
Quaresma P.
Seiler J.
Serény A.
Vries A.P. de
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Successful news recommendation requires facing the challenges of dynamic item sets, contextual item relevance, and of fulfilling non-functional requirements, such as response time. The CLEF NewsREEL challenge is a campaign-style evaluation lab allowing participants to tackle news recommendation and to optimize and evaluate their recommender algorithms both online and offline. In this paper, we summarize the objectives and challenges of NewsREEL 2016. We cover two contrasting perspectives on the challenge: that of the operator (the business providing recommendations) and that of the challenge participant (the researchers developing recommender algorithms). In the intersection of these perspectives, new insights can be gained on how to effectively evaluate real-time stream recommendation algorithms

Crossref

CWI's Institutional Repository

Enlighten

Radboud Repository

White Rose Research Online