1,350 research outputs found
Bridging Offline-Online Evaluation with a Time-dependent and Popularity Bias-free Offline Metric for Recommenders
The evaluation of recommendation systems is a complex task. The offline and
online evaluation metrics for recommender systems are ambiguous in their true
objectives. The majority of recently published papers benchmark their methods
using ill-posed offline evaluation methodology that often fails to predict true
online performance. Because of this, the impact that academic research has on
the industry is reduced. The aim of our research is to investigate and compare
the online performance of offline evaluation metrics. We show that penalizing
popular items and considering the time of transactions during the evaluation
significantly improves our ability to choose the best recommendation model for
a live recommender system. Our results, averaged over five large-size
real-world live data procured from recommenders, aim to help the academic
community to understand better offline evaluation and optimization criteria
that are more relevant for real applications of recommender systems.Comment: Accepted to evalRS 2023@KD
Evaluating Conversational Recommender Systems: A Landscape of Research
Conversational recommender systems aim to interactively support online users
in their information search and decision-making processes in an intuitive way.
With the latest advances in voice-controlled devices, natural language
processing, and AI in general, such systems received increased attention in
recent years. Technically, conversational recommenders are usually complex
multi-component applications and often consist of multiple machine learning
models and a natural language user interface. Evaluating such a complex system
in a holistic way can therefore be challenging, as it requires (i) the
assessment of the quality of the different learning components, and (ii) the
quality perception of the system as a whole by users. Thus, a mixed methods
approach is often required, which may combine objective (computational) and
subjective (perception-oriented) evaluation techniques. In this paper, we
review common evaluation approaches for conversational recommender systems,
identify possible limitations, and outline future directions towards more
holistic evaluation practices
Report from Dagstuhl Seminar 23031: Frontiers of Information Access Experimentation for Research and Education
This report documents the program and the outcomes of Dagstuhl Seminar 23031
``Frontiers of Information Access Experimentation for Research and Education'',
which brought together 37 participants from 12 countries.
The seminar addressed technology-enhanced information access (information
retrieval, recommender systems, natural language processing) and specifically
focused on developing more responsible experimental practices leading to more
valid results, both for research as well as for scientific education.
The seminar brought together experts from various sub-fields of information
access, namely IR, RS, NLP, information science, and human-computer interaction
to create a joint understanding of the problems and challenges presented by
next generation information access systems, from both the research and the
experimentation point of views, to discuss existing solutions and impediments,
and to propose next steps to be pursued in the area in order to improve not
also our research methods and findings but also the education of the new
generation of researchers and developers.
The seminar featured a series of long and short talks delivered by
participants, who helped in setting a common ground and in letting emerge
topics of interest to be explored as the main output of the seminar. This led
to the definition of five groups which investigated challenges, opportunities,
and next steps in the following areas: reality check, i.e. conducting
real-world studies, human-machine-collaborative relevance judgment frameworks,
overcoming methodological challenges in information retrieval and recommender
systems through awareness and education, results-blind reviewing, and guidance
for authors.Comment: Dagstuhl Seminar 23031, report
- …