256,245 research outputs found

    PRES: A score metric for evaluating recall-oriented information retrieval applications

    Get PDF
    Information retrieval (IR) evaluation scores are generally designed to measure the effectiveness with which relevant documents are identified and retrieved. Many scores have been proposed for this purpose over the years. These have primarily focused on aspects of precision and recall, and while these are often discussed with equal importance, in practice most attention has been given to precision focused metrics. Even for recalloriented IR tasks of growing importance, such as patent retrieval, these precision based scores remain the primary evaluation measures. Our study examines different evaluation measures for a recall-oriented patent retrieval task and demonstrates the limitations of the current scores in comparing different IR systems for this task. We introduce PRES, a novel evaluation metric for this type of application taking account of recall and the userā€™s search effort. The behaviour of PRES is demonstrated on 48 runs from the CLEF-IP 2009 patent retrieval track. A full analysis of the performance of PRES shows its suitability for measuring the retrieval effectiveness of systems from a recall focused perspective taking into account the userā€™s expected search effort

    Evaluating epistemic uncertainty under incomplete assessments

    Get PDF
    The thesis of this study is to propose an extended methodology for laboratory based Information Retrieval evaluation under incomplete relevance assessments. This new methodology aims to identify potential uncertainty during system comparison that may result from incompleteness. The adoption of this methodology is advantageous, because the detection of epistemic uncertainty - the amount of knowledge (or ignorance) we have about the estimate of a system's performance - during the evaluation process can guide and direct researchers when evaluating new systems over existing and future test collections. Across a series of experiments we demonstrate how this methodology can lead towards a finer grained analysis of systems. In particular, we show through experimentation how the current practice in Information Retrieval evaluation of using a measurement depth larger than the pooling depth increases uncertainty during system comparison

    A laboratory-based method for the evaluation of personalised search

    Get PDF
    Comparative evaluation of Information Retrieval Systems (IRSs) using publically available test collections has become an established practice in Information Retrieval (IR). By means of the popular Cranfield evaluation paradigm IR test collections enable researchers to compare new methods to existing approaches. An important area of IR research where this strategy has not been applied to date is Personalised Information Retrieval (PIR), which has generally relied on user-based evaluations. This paper describes a method that enables the creation of publically available extended test collections to allow repeatable laboratory-based evaluation of personalised search

    Highly focused document retrieval in aerospace engineering : user interaction design and evaluation

    Get PDF
    Purpose ā€“ This paper seeks to describe the preliminary studies (on both users and data), the design and evaluation of the K-Search system for searching legacy documents in aerospace engineering. Real-world reports of jet engine maintenance challenge the current indexing practice, while real usersā€™ tasks require retrieving the information in the proper context. K-Search is currently in use in Rolls-Royce plc and has evolved to include other tools for knowledge capture and management. Design/methodology/approach ā€“ Semantic Web techniques have been used to automatically extract information from the reports while maintaining the original context, allowing a more focused retrieval than with more traditional techniques. The paper combines semantic search with classical information retrieval to increase search effectiveness. An innovative user interface has been designed to take advantage of this hybrid search technique. The interface is designed to allow a flexible and personal approach to searching legacy data. Findings ā€“ The user evaluation showed that the system is effective and well received by users. It also shows that different people look at the same data in different ways and make different use of the same system depending on their individual needs, influenced by their job profile and personal attitude. Research limitations/implications ā€“ This study focuses on a specific case of an enterprise working in aerospace engineering. Although the findings are likely to be shared with other engineering domains (e.g. mechanical, electronic), the study does not expand the evaluation to different settings. Originality/value ā€“ The study shows how real context of use can provide new and unexpected challenges to researchers and how effective solutions can then be adopted and used in organizations.</p

    APPLICATION OF COGNITIVE PRINCIPLES WITHIN AN ONLINE STATISTICAL LEARNING ENVIRONMENT

    Get PDF
    Three experiments were conducted in order to further investigate optimal learning procedures within an online statistical learning environment. Experiment 1 exposed learners to retrieval practice learning conditions with or without segmentation. Retrieval practice formats included; multiple choice, open ended, multiple evaluation, or instruction only type manipulations. Experiment 2 explored the impact of added immediate feedback in conjunction with retrieval practice and segmentation. Experiment 3 further investigated how the benefits of optimal learning procedures transfer to novel situations / examinations. Within all experiments a series of metacognitive questions were administered to learners in order to measure their metamemory over the statistical knowledge that was taught. In alignment with our hypotheses and previous research it was found that retrieval practice (experiment 1) and retrieval practice with immediate feedback (experiment 2) tended to boost memory retention. However, the data trend for all experiments tended to suggest that segmentation has little or no impact on statistical learning, such a finding was support against our hypotheses as well as the findings within previous studies. Though the results are somewhat mixed, the benefits associated with retrieval practice and retrieval practice with feedback did not seem to transfer to novel instances. Individuals that learned within an open ended or multiple evaluation type format tended to have greater insight into their own metacognitive knowledge

    Open Domain Multi-document Summarization: A Comprehensive Study of Model Brittleness under Retrieval

    Full text link
    Multi-document summarization (MDS) assumes a set of topic-related documents are provided as input. In practice, this document set is not always available; it would need to be retrieved given an information need, i.e. a question or topic statement, a setting we dub "open-domain" MDS. We study this more challenging setting by formalizing the task and bootstrapping it using existing datasets, retrievers and summarizers. Via extensive automatic and human evaluation, we determine: (1) state-of-the-art summarizers suffer large reductions in performance when applied to open-domain MDS, (2) additional training in the open-domain setting can reduce this sensitivity to imperfect retrieval, and (3) summarizers are insensitive to the retrieval of duplicate documents and the order of retrieved documents, but highly sensitive to other errors, like the retrieval of irrelevant documents. Based on our results, we provide practical guidelines to enable future work on open-domain MDS, e.g. how to choose the number of retrieved documents to summarize. Our results suggest that new retrieval and summarization methods and annotated resources for training and evaluation are necessary for further progress in the open-domain setting.Comment: Accepted to EMNLP Findings 202
    • ā€¦
    corecore