73 research outputs found

    The LKPY Package for Recommender Systems Experiments

    Get PDF
    Since 2010, we have built and maintained LensKit, an open-source toolkit for building, researching, and learning about recommender systems. We have successfully used the software in a wide range of recommender systems experiments, to support education in traditional classroom and online settings, and as the algorithmic backend for user-facing recommendation services in movies and books. This experience, along with community feedback, has surfaced a number of challenges with LensKit’s design and environmental choices. In response to these challenges, we are developing a new set of tools that leverage the PyData stack to enable the kinds of research experiments and educational experiences that we have been able to deliver with LensKit, along with new experimental structures that the existing code makes difficult. The result is a set of research tools that should significantly increase research velocity and provide much smoother integration with other software such as Keras while maintaining the same level of reproducibility as a LensKit experiment. In this paper, we reflect on the LensKit project, particularly on our experience using it for offline evaluation experiments, and describe the next-generation LKPY tools for enabling new offline evaluations and experiments with flexible, open-ended designs and well-tested evaluation primitives

    Candidate Set Sampling for Evaluating Top-N Recommendation

    Full text link
    The strategy for selecting candidate sets -- the set of items that the recommendation system is expected to rank for each user -- is an important decision in carrying out an offline top-NN recommender system evaluation. The set of candidates is composed of the union of the user's test items and an arbitrary number of non-relevant items that we refer to as decoys. Previous studies have aimed to understand the effect of different candidate set sizes and selection strategies on evaluation. In this paper, we extend this knowledge by studying the specific interaction of candidate set selection strategies with popularity bias, and use simulation to assess whether sampled candidate sets result in metric estimates that are less biased with respect to the true metric values under complete data that is typically unavailable in ordinary experiments

    Measuring Fairness in Ranked Results: An Analytical and Empirical Comparison

    Get PDF
    Information access systems, such as search and recommender systems, often use ranked lists to present results believed to be relevant to the user\u27s information need. Evaluating these lists for their fairness along with other traditional metrics provides a more complete understanding of an information access system\u27s behavior beyond accuracy or utility constructs. To measure the (un)fairness of rankings, particularly with respect to the protected group(s) of producers or providers, several metrics have been proposed in the last several years. However, an empirical and comparative analyses of these metrics showing the applicability to specific scenario or real data, conceptual similarities, and differences is still lacking. We aim to bridge the gap between theoretical and practical application of these metrics. In this paper we describe several fair ranking metrics from the existing literature in a common notation, enabling direct comparison of their approaches and assumptions, and empirically compare them on the same experimental setup and data sets in the context of three information access tasks. We also provide a sensitivity analysis to assess the impact of the design choices and parameter settings that go in to these metrics and point to additional work needed to improve fairness measurement

    Monte Carlo Estimates of Evaluation Metric Error and Bias: Work in Progress

    Get PDF
    Traditional offline evaluations of recommender systems apply metrics from machine learning and information retrieval in settings where their underlying assumptions no longer hold. This results in significant error and bias in measures of top-N recommendation performance, such as precision, recall, and nDCG. Several of the specific causes of these errors, including popularity bias and misclassified decoy items, are well-explored in the existing literature. In this paper we survey a range of work on identifying and addressing these problems, and report on our work in progress to simulate the recommender data generation and evaluation processes to quantify the extent of evaluation metric errors and assess their sensitivity to various assumptions

    Recommender Systems Notation

    Get PDF
    As the field of recommender systems has developed, authors have used a myriad of notations for describing the mathematical workings of recommendation algorithms. These notations appear in research papers, books, lecture notes, blog posts, and software documentation. The disciplinary diversity of the field has not contributed to consistency in notation; scholars whose home base is in information retrieval have different habits and expectations than those in machine learning or human-computer interaction. In the course of years of teaching and research on recommender systems, we have seen the value in adopting a consistent notation across our work. This has been particularly highlighted in our development of the Recommender Systems MOOC on Coursera (Konstan et al. 2015), as we need to explain a wide variety of algorithms and our learners are not well-served by changing notation between algorithms. In this paper, we describe the notation we have adopted in our work, along with its justification and some discussion of considered alternatives. We present this in hope that it will be useful to others writing and teaching about recommender systems. This notation has served us well for some time now, in research, online education, and traditional classroom instruction. We feel it is ready for broad use

    Fairness in Information Access Systems

    Get PDF
    Recommendation, information retrieval, and other information access systems pose unique challenges for investigating and applying the fairness and non-discrimination concepts that have been developed for studying other machine learning systems. While fair information access shares many commonalities with fair classification, the multistakeholder nature of information access applications, the rank-based problem setting, the centrality of personalization in many cases, and the role of user response complicate the problem of identifying precisely what types and operationalizations of fairness may be relevant, let alone measuring or promoting them. In this monograph, we present a taxonomy of the various dimensions of fair information access and survey the literature to date on this new and rapidly-growing topic. We preface this with brief introductions to information access and algorithmic fairness, to facilitate use of this work by scholars with experience in one (or neither) of these fields who wish to learn about their intersection. We conclude with several open problems in fair information access, along with some suggestions for how to approach research in this space
    corecore