106,243 research outputs found
A new metric for patent retrieval evaluation
Patent retrieval is generally considered to be a recall-oriented information retrieval task that is growing in importance. Despite this fact, precision based scores such as mean average precision (MAP) remain the primary evaluation measures for patent retrieval. Our study examines different evaluation measures for the recall-oriented patent retrieval task and shows the limitations
of the current scores in comparing different IR systems for this task. We introduce PRES, a novel evaluation metric for this type of application taking account of recall and user search effort. The behaviour of PRES is demonstrated on 48 runs from the CLEF-IP 2009 patent retrieval track. A full analysis of the performance of PRES shows its suitability for measuring the retrieval effectiveness of systems from a recall focused perspective taking into account the expected search effort of patent searchers
A Sequential Latent Topic-based Readability Model for Domain-Specific Information Retrieval.
In domain-specific information retrieval (IR), an emerging problem is how to provide different users with documents that are both relevant and readable, especially for the lay users. In this paper, we propose a novel document readability model to enhance the domain-specific IR. Our model incorporates the coverage and sequential dependency of latent topics in a document. Accordingly, two topical readability indicators, namely Topic Scope and Topic Trace are developed. These indicators, combined with the classical Surface-level indicator, can be used to rerank the initial list of documents returned by a conventional search engine. In order to extract the structured latent topics without supervision, the hierarchical Latent Dirichlet Allocation (hLDA) is used. We have evaluated our model from the user-oriented and system-oriented perspectives, in the medical domain. The user-oriented evaluation shows a good correlation between the readability scores given by our model and human judgments. Furthermore, our model also gains significant improvement in the system-oriented evaluation in comparison with one of the state-of-the-art readability methods
Context-Driven Interactive Query Simulations Based on Generative Large Language Models
Simulating user interactions enables a more user-oriented evaluation of
information retrieval (IR) systems. While user simulations are cost-efficient
and reproducible, many approaches often lack fidelity regarding real user
behavior. Most notably, current user models neglect the user's context, which
is the primary driver of perceived relevance and the interactions with the
search results. To this end, this work introduces the simulation of
context-driven query reformulations. The proposed query generation methods
build upon recent Large Language Model (LLM) approaches and consider the user's
context throughout the simulation of a search session. Compared to simple
context-free query generation approaches, these methods show better
effectiveness and allow the simulation of more efficient IR sessions.
Similarly, our evaluations consider more interaction context than current
session-based measures and reveal interesting complementary insights in
addition to the established evaluation protocols. We conclude with directions
for future work and provide an entirely open experimental setup.Comment: Accepted at ECIR 2024 (Full Paper
An Intrinsic Framework of Information Retrieval Evaluation Measures
Information retrieval (IR) evaluation measures are cornerstones for
determining the suitability and task performance efficiency of retrieval
systems. Their metric and scale properties enable to compare one system against
another to establish differences or similarities. Based on the representational
theory of measurement, this paper determines these properties by exploiting the
information contained in a retrieval measure itself. It establishes the
intrinsic framework of a retrieval measure, which is the common scenario when
the domain set is not explicitly specified. A method to determine the metric
and scale properties of any retrieval measure is provided, requiring knowledge
of only some of its attained values. The method establishes three main
categories of retrieval measures according to their intrinsic properties. Some
common user-oriented and system-oriented evaluation measures are classified
according to the presented taxonomy.Comment: 23 page
Visual exploration and retrieval of XML document collections with the generic system X2
This article reports on the XML retrieval system X2 which has been developed at the University of Munich over the last five years. In a typical session with X2, the user
first browses a structural summary of the XML database in order to select interesting elements and keywords occurring in documents. Using this intermediate result, queries combining structure and textual references are composed semiautomatically.
After query evaluation, the full set of answers is presented in a visual and structured way. X2 largely exploits the structure found in documents, queries and answers to enable new interactive visualization and exploration techniques that support mixed IR and database-oriented querying, thus bridging the gap between these three views on the data to be retrieved. Another salient characteristic of X2 which distinguishes it from other visual query systems for XML is that it supports various degrees of detailedness in the presentation of answers, as well as techniques for dynamically reordering and grouping retrieved elements once the complete answer set has been computed
ConvAI3: Generating Clarifying Questions for Open-Domain Dialogue Systems (ClariQ)
This document presents a detailed description of the challenge on clarifying
questions for dialogue systems (ClariQ). The challenge is organized as part of
the Conversational AI challenge series (ConvAI3) at Search Oriented
Conversational AI (SCAI) EMNLP workshop in 2020. The main aim of the
conversational systems is to return an appropriate answer in response to the
user requests. However, some user requests might be ambiguous. In IR settings
such a situation is handled mainly thought the diversification of the search
result page. It is however much more challenging in dialogue settings with
limited bandwidth. Therefore, in this challenge, we provide a common evaluation
framework to evaluate mixed-initiative conversations. Participants are asked to
rank clarifying questions in an information-seeking conversations. The
challenge is organized in two stages where in Stage 1 we evaluate the
submissions in an offline setting and single-turn conversations. Top
participants of Stage 1 get the chance to have their model tested by human
annotators
PRES: A score metric for evaluating recall-oriented information retrieval applications
Information retrieval (IR) evaluation scores are generally
designed to measure the effectiveness with which relevant
documents are identified and retrieved. Many scores have been proposed for this purpose over the years. These have primarily focused on aspects of precision and recall, and while these are often discussed with equal importance, in practice most attention has been given to precision focused metrics. Even for recalloriented IR tasks of growing importance, such as patent retrieval, these precision based scores remain the primary evaluation measures. Our study examines different evaluation measures for a recall-oriented patent retrieval task and demonstrates the limitations of the current scores in comparing different IR systems for this task. We introduce PRES, a novel evaluation metric for this type of application taking account of recall and the user’s search effort. The behaviour of PRES is demonstrated on 48 runs from the CLEF-IP 2009 patent retrieval track. A full analysis of the performance of PRES shows its suitability for measuring the
retrieval effectiveness of systems from a recall focused
perspective taking into account the user’s expected search effort
The Simplest Evaluation Measures for XML Information Retrieval that Could Possibly Work
This paper reviews several evaluation measures developed for evaluating XML information retrieval (IR) systems. We argue that these measures, some of which are currently in use by the INitiative for the Evaluation of XML Retrieval (INEX), are complicated, hard to understand, and hard to explain to users of XML IR systems. To show the value of keeping things simple, we report alternative evaluation results of official evaluation runs submitted to INEX 2004 using simple metrics, and show its value for INEX
Recommended from our members
Personalization via collaboration in web retrieval systems: a context based approach
World Wide Web is a source of information, and searches on the Web can be analyzed to detect patterns in Web users' search behaviors and information needs to effectively handle the users' subsequent needs. The rationale is that the information need of a user at a particular time point occurs in a particular context, and queries are derived from that need. In this paper, we discuss an extension of our personalization approach that was originally developed for a traditional bibliographic retrieval system but has been adapted and extended with a collaborative model for the Web retrieval environment. We start with a brief introduction of our personalization approach in a traditional information retrieval system. Then, based on the differences in the nature of documents, users and search tasks between traditional and Web retrieval environments, we describe our extensions of integrating collaboration in personalization in the Web retrieval environment. The architecture for the extension integrates machine learning techniques for the purpose of better modeling users' search tasks. Finally, a user-oriented evaluation of Web-based adaptive retrieval systems is presented as an important aspect of the overall strategy for personalization
- …