256,245 research outputs found
PRES: A score metric for evaluating recall-oriented information retrieval applications
Information retrieval (IR) evaluation scores are generally
designed to measure the effectiveness with which relevant
documents are identified and retrieved. Many scores have been proposed for this purpose over the years. These have primarily focused on aspects of precision and recall, and while these are often discussed with equal importance, in practice most attention has been given to precision focused metrics. Even for recalloriented IR tasks of growing importance, such as patent retrieval, these precision based scores remain the primary evaluation measures. Our study examines different evaluation measures for a recall-oriented patent retrieval task and demonstrates the limitations of the current scores in comparing different IR systems for this task. We introduce PRES, a novel evaluation metric for this type of application taking account of recall and the userās search effort. The behaviour of PRES is demonstrated on 48 runs from the CLEF-IP 2009 patent retrieval track. A full analysis of the performance of PRES shows its suitability for measuring the
retrieval effectiveness of systems from a recall focused
perspective taking into account the userās expected search effort
Evaluating epistemic uncertainty under incomplete assessments
The thesis of this study is to propose an extended methodology for laboratory based Information Retrieval evaluation under incomplete relevance assessments. This new methodology aims to identify potential uncertainty during system comparison that may result from incompleteness. The adoption of this methodology is advantageous, because the detection of epistemic uncertainty - the amount of knowledge (or ignorance) we have about the estimate of a system's performance - during the evaluation process can guide and direct researchers when evaluating new systems over existing and future test collections. Across a series of experiments we demonstrate how this methodology can lead towards a finer grained analysis of systems. In particular, we show through experimentation how the current practice in Information Retrieval evaluation of using a measurement depth larger than the pooling depth increases uncertainty during system comparison
A laboratory-based method for the evaluation of personalised search
Comparative evaluation of Information Retrieval Systems
(IRSs) using publically available test collections has become
an established practice in Information Retrieval (IR). By
means of the popular Cranfield evaluation paradigm IR test
collections enable researchers to compare new methods to
existing approaches. An important area of IR research where
this strategy has not been applied to date is Personalised
Information Retrieval (PIR), which has generally relied on
user-based evaluations. This paper describes a method that
enables the creation of publically available extended test collections to allow repeatable laboratory-based evaluation of
personalised search
Highly focused document retrieval in aerospace engineering : user interaction design and evaluation
Purpose ā This paper seeks to describe the preliminary studies (on both users and data), the design and evaluation of the K-Search system for searching legacy documents in aerospace engineering. Real-world reports of jet engine maintenance challenge the current indexing practice, while real usersā tasks require retrieving the information in the proper context. K-Search is currently in use in Rolls-Royce plc and has evolved to include other tools for knowledge capture and management.
Design/methodology/approach ā Semantic Web techniques have been used to automatically extract information from the reports while maintaining the original context, allowing a more focused retrieval than with more traditional techniques. The paper combines semantic search with classical information retrieval to increase search effectiveness. An innovative user interface has been designed to take advantage of this hybrid search technique. The interface is designed to allow a flexible and
personal approach to searching legacy data.
Findings ā The user evaluation showed that the system is effective and well received by users. It also shows that different people look at the same data in different ways and make different use of the same system depending on their individual needs, influenced by their job profile and personal attitude.
Research limitations/implications ā This study focuses on a specific case of an enterprise working in aerospace engineering. Although the findings are likely to be shared with other engineering domains (e.g. mechanical, electronic), the study does not expand the evaluation to different settings.
Originality/value ā The study shows how real context of use can provide new and unexpected challenges to researchers and how effective solutions can then be adopted and used in organizations.</p
APPLICATION OF COGNITIVE PRINCIPLES WITHIN AN ONLINE STATISTICAL LEARNING ENVIRONMENT
Three experiments were conducted in order to further investigate optimal learning procedures within an online statistical learning environment. Experiment 1 exposed learners to retrieval practice learning conditions with or without segmentation. Retrieval practice formats included; multiple choice, open ended, multiple evaluation, or instruction only type manipulations. Experiment 2 explored the impact of added immediate feedback in conjunction with retrieval practice and segmentation. Experiment 3 further investigated how the benefits of optimal learning procedures transfer to novel situations / examinations. Within all experiments a series of metacognitive questions were administered to learners in order to measure their metamemory over the statistical knowledge that was taught. In alignment with our hypotheses and previous research it was found that retrieval practice (experiment 1) and retrieval practice with immediate feedback (experiment 2) tended to boost memory retention. However, the data trend for all experiments tended to suggest that segmentation has little or no impact on statistical learning, such a finding was support against our hypotheses as well as the findings within previous studies. Though the results are somewhat mixed, the benefits associated with retrieval practice and retrieval practice with feedback did not seem to transfer to novel instances. Individuals that learned within an open ended or multiple evaluation type format tended to have greater insight into their own metacognitive knowledge
Open Domain Multi-document Summarization: A Comprehensive Study of Model Brittleness under Retrieval
Multi-document summarization (MDS) assumes a set of topic-related documents
are provided as input. In practice, this document set is not always available;
it would need to be retrieved given an information need, i.e. a question or
topic statement, a setting we dub "open-domain" MDS. We study this more
challenging setting by formalizing the task and bootstrapping it using existing
datasets, retrievers and summarizers. Via extensive automatic and human
evaluation, we determine: (1) state-of-the-art summarizers suffer large
reductions in performance when applied to open-domain MDS, (2) additional
training in the open-domain setting can reduce this sensitivity to imperfect
retrieval, and (3) summarizers are insensitive to the retrieval of duplicate
documents and the order of retrieved documents, but highly sensitive to other
errors, like the retrieval of irrelevant documents. Based on our results, we
provide practical guidelines to enable future work on open-domain MDS, e.g. how
to choose the number of retrieved documents to summarize. Our results suggest
that new retrieval and summarization methods and annotated resources for
training and evaluation are necessary for further progress in the open-domain
setting.Comment: Accepted to EMNLP Findings 202
- ā¦