3 research outputs found

    On Design and Evaluation of High-Recall Retrieval Systems for Electronic Discovery

    Get PDF
    High-recall retrieval is an information retrieval task model where the goal is to identify, for human consumption, all, or as many as practicable, documents relevant to a particular information need. This thesis investigates the ways in which one can evaluate high-recall retrieval systems and explores several design considerations that should be accounted for when designing such systems for electronic discovery. The primary contribution of this work is a framework for conducting high-recall retrieval experimentation in a controlled and repeatable way. This framework builds upon lessons learned from similar tasks to facilitate the use of retrieval systems on collections that cannot be distributed due to the sensitivity or privacy of the material contained within. Accordingly, a Web API is used to distribute document collections, informations needs, and corresponding relevance assessments in a one-document-at-a-time manner. Validation is conducted through the successful deployment of this architecture in the 2015 TREC Total Recall track over the live Web and in controlled environments. Using the runs submitted to the Total Recall track and other test collections, we explore the efficacy of a variety of new and existing effectiveness measures to high-recall retrieval tasks. We find that summarizing the trade-off between recall and the effort required to attain that recall is a non-trivial task and that several measures are sensitive to properties of the test collections themselves. We conclude that the gain curve, a de facto standard, and variants of the gain curve are the most robust to variations in test collection properties and the evaluation of high-recall systems. This thesis also explores the effect that non-authoritative, surrogate assessors can have when training machine learning algorithms. Contrary to popular thought, we find that surrogate assessors appear to be inferior to authoritative assessors due to differences of opinion rather than innate inferiority in their ability to identify relevance. Furthermore, we show that several techniques for diversifying and liberalizing a surrogate assessor's conception of relevance can yield substantial improvement in the surrogate and, in some cases, rival the authority. Finally, we present the results of a user study conducted to investigate the effect that three archetypal high-recall retrieval systems have on judging behaviour. Compared to using random and uncertainty sampling, selecting documents for training using relevance sampling significantly decreases the probability that a user will identify that document as relevant. On the other hand, no substantial difference between the test conditions is observed in the time taken to render such assessments

    Increasing the Efficiency of High-Recall Information Retrieval

    Get PDF
    The goal of high-recall information retrieval (HRIR) is to find all, or nearly all, relevant documents while maintaining reasonable assessment effort. Achieving high recall is a key problem in the use of applications such as electronic discovery, systematic review, and construction of test collections for information retrieval tasks. State-of-the-art HRIR systems commonly rely on iterative relevance feedback in which human assessors continually assess machine learning-selected documents. The relevance of the assessed documents is then fed back to the machine learning model to improve its ability to select the next set of potentially relevant documents for assessment. In many instances, thousands of human assessments might be required to achieve high recall. These assessments represent the main cost of such HRIR applications. Therefore, their effectiveness in achieving high recall is limited by their reliance on human input when assessing the relevance of documents. In this thesis, we test different methods in order to improve the effectiveness and efficiency of finding relevant documents using state-of-the-art HRIR system. With regard to the effectiveness, we try to build a machine-learned model that retrieves relevant documents more accurately. For efficiency, we try to help human assessors make relevance assessments more easily and quickly via our HRIR system. Furthermore, we try to establish a stopping criteria for the assessment process so as to avoid excessive assessment. In particular, we hypothesize that total assessment effort to achieve high recall can be reduced by using shorter document excerpts (e.g., extractive summaries) in place of full documents for the assessment of relevance and using a high-recall retrieval system based on continuous active learning (CAL). In order to test this hypothesis, we implemented a high-recall retrieval system based on state-of-the-art implementation of CAL. This high-recall retrieval system could display either full documents or short document excerpts for relevance assessment. A search engine was also integrated into our system to provide assessors the option of conducting interactive search and judging. We conducted a simulation study, and separately, a 50-person controlled user study to test our hypothesis. The results of the simulation study show that judging even a single extracted sentence for relevance feedback may be adequate for CAL to achieve high recall. The results of the controlled user study confirmed that human assessors were able to find a significantly larger number of relevant documents within limited time when they used the system with paragraph-length document excerpts as opposed to full documents. In addition, we found that allowing participants to compose and execute their own search queries did not improve their ability to find relevant documents and, by some measures, impaired performance. Moreover, integrating sampling methods with active learning can yield accurate estimates of the number of relevant documents, and thus avoid excessive assessments

    Impact of Surrogate Assessments on High-Recall Retrieval

    No full text
    ABSTRACT We are concerned with the effect of using a surrogate assessor to train a passive (i.e., batch) supervised-learning method to rank documents for subsequent review, wher
    corecore