25 research outputs found

    Offline Evaluation via Human Preference Judgments: A Dueling Bandits Problem

    Get PDF
    The dramatic improvements in core information retrieval tasks engendered by neural rankers create a need for novel evaluation methods. If every ranker returns highly relevant items in the top ranks, it becomes difficult to recognize meaningful differences between them and to build reusable test collections. Several recent papers explore pairwise preference judgments as an alternative to traditional graded relevance assessments. Rather than viewing items one at a time, assessors view items side-by-side and indicate the one that provides the better response to a query, allowing fine-grained distinctions. If we employ preference judgments to identify the probably best items for each query, we can measure rankers by their ability to place these items as high as possible. I frame the problem of finding best items as a dueling bandits problem. While many papers explore dueling bandits for online ranker evaluation via interleaving, they have not been considered as a framework for offline evaluation via human preference judgments. I review the literature for possible solutions. For human preference judgments, any usable algorithm must tolerate ties since two items may appear nearly equal to assessors. It must minimize the number of judgments required for any specific pair since each such comparison requires an independent assessor. Since the theoretical guarantees provided by most algorithms depend on assumptions that are not satisfied by human preference judgments, I simulate selected algorithms on representative test cases to provide insight into their practical utility. In contrast to the previous paper presented at SIGIR 2022 [87], I include more theoretical analysis and experimental results in this work. Based on the simulations, two algorithms stand out for their potential. I proceed with the method of Clarke et al. [20], and the simulations suggest modifications to further improve its performance. Using the modified algorithm, over 10,000 preference judgments for pools derived from submissions to the TREC 2021 Deep Learning Track are collected, confirming its suitability. We test the idea of best-item evaluation and suggest ideas for further theoretical and practical progress

    On Connections Between Machine Learning And Information Elicitation, Choice Modeling, And Theoretical Computer Science

    Get PDF
    Machine learning, which has its origins at the intersection of computer science and statistics, is now a rapidly growing area of research that is being integrated into almost every discipline in science and business such as economics, marketing and information retrieval. As a consequence of this integration, it is necessary to understand how machine learning interacts with these disciplines and to understand fundamental questions that arise at the resulting interfaces. The goal of my thesis research is to study these interdisciplinary questions at the interface of machine learning and other disciplines including mechanism design/information elicitation, preference/choice modeling, and theoretical computer science

    Bowdoin College Catalogue and Academic Handbook (2023-2024)

    Get PDF
    https://digitalcommons.bowdoin.edu/course-catalogues/1321/thumbnail.jp

    Bowdoin College Catalogue and Academic Handbook (2022-2023)

    Get PDF
    https://digitalcommons.bowdoin.edu/course-catalogues/1320/thumbnail.jp

    Bowdoin College Catalogue and Academic Handbook (2021-2022)

    Get PDF
    https://digitalcommons.bowdoin.edu/course-catalogues/1319/thumbnail.jp

    Condensing Information: From Supervised To Crowdsourced Learning

    Full text link
    The main focus of this dissertation is new and improved ways of bringing high quality content to the users by leveraging the power of machine learning. Starting with a large amount of data we want to condense it into an easily digestible form by removing redundant and irrelevant parts and retaining only important information that is of interest to the user. Learning how to perform this from data allows us to use more complex models that better capture the notion of good content. Starting with supervised learning, this thesis proposes using structured prediction in conjunction with support vector machines to learn how to produce extractive summaries of textual documents. Representing summaries as a multivariate objects allows for modeling the dependencies between the summary components. An efficient approach to learning and predicting summaries is still possible by using a submodular objective/scoring function despite complex output space. The discussed approach can also be adapted to unsupervised setting and used to condense information in novel ways while retaining the same efficient submodular framework. Incorporating temporal dimension into summarization objective lead to a new way of visualizing flow of ideas and identifying novel contributions in a time-stamped corpus, which in turn help users gain a high level insight into evolution of it. Lastly, instead of trying to explicitly define an automated function used to condense information, one can leverage crowdsourcing. In particular, this thesis considers user feedback on online user-generated content to construct and improve content rankings. An analysis of a real-world dataset is presented and results suggest more accurate models of actual user voting patterns. Based on this new knowledge, an improved content ranking algorithm is proposed that delivers good content to the users in a shorter timeframe

    Bowdoin Orient v.133, no.1-25 (2001-2002)

    Get PDF
    https://digitalcommons.bowdoin.edu/bowdoinorient-2000s/1002/thumbnail.jp

    LIPIcs, Volume 251, ITCS 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 251, ITCS 2023, Complete Volum

    Bowdoin Orient v.139, no.1-26 (2009-2010)

    Get PDF
    https://digitalcommons.bowdoin.edu/bowdoinorient-2010s/1000/thumbnail.jp
    corecore