Search CORE

32 research outputs found

An analysis of query difficulty for information retrieval in the medical domain

Author: Leveling J.
Pradhan S.
Sakai T.
Voorhees E. M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 06/07/2014
Field of study

We present a post-hoc analysis of a benchmarking activity for information retrieval (IR) in the medical domain to determine if performance for queries with different levels of complexity can be associated with different IR methods or techniques. Our analysis is based on data and runs for Task 3 of the CLEF 2013 eHealth lab, which provided patient queries and a large medical document collection for patient centred medical information retrieval technique development. We categorise the queries based on their complexity, which is defined as the number of medical concepts they contain. We then show how query complexity affects performance of runs submitted to the lab, and provide suggestions for improving retrieval quality for this complex retrieval task and similar IR evaluation tasks

Crossref

Hal - Université Grenoble Alpes

DCU Online Research Access Service

Performance Analysis of Information Retrieval Systems

Author: Ayter Juli
Chifu Adrian-Gabriel
Dejean Sébastien
Desclaux Cecile
Mothe Josiane
Publication venue: HAL CCSD
Publication date: 01/01/2014
Field of study

International audienceIt has been shown that there is not a best information retrieval system configuration which would work for any query, but rather that performance can vary from one query to another. It would be interesting if a meta-system could decide which system should process a new query by learning from the context of previously submitted queries. This paper reports a deep analysis considering more than 80,000 search engine configurations applied to 100 queries and the corresponding performance. The goal of the analysis is to identify which search engine configuration responds best to a certain type of query. We considered two approaches to define query types: one is based on query clustering according to the query performance (their difficulty), while the other approach uses various query features (including query difficulty predictors) to cluster queries. We identified two parameters that should be optimized first. An important outcome is that we could not obtain strong conclusive results; considering the large number of systems and methods we used, this result could lead to the conclusion that current query features does not fit the optimizing problem

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

HAL-INSA Toulouse

Language Modeling Approaches to Information Retrieval

Author: Banerjee Protima
Han Hyoil
Publication venue: Marshall Digital Scholar
Publication date: 08/04/2009
Field of study

This article surveys recent research in the area of language modeling (sometimes called statistical language modeling) approaches to information retrieval. Language modeling is a formal probabilistic retrieval framework with roots in speech recognition and natural language processing. The underlying assumption of language modeling is that human language generation is a random process; the goal is to model that process via a generative statistical model. In this article, we discuss current research in the application of language modeling to information retrieval, the role of semantics in the language modeling framework, cluster-based language models, use of language modeling for XML retrieval and future trends

Marshall University

Recommended from our members

WIDIT in TREC-2005 HARD, Robust, and SPAM tracks

Author: Akram Shahrier
George Nicholas
Loehrlen Aaron
McCaulay David
Mei Jue
Ning Yu
Record Ivan
Yang Kiduk
Zhang Hui
Publication venue
Publication date
Field of study

Web Information Discovery Tool (WIDIT) Laboratory at the Indiana University School of Library and Information Science participated in the HARD, Robust, and SPAM tracks in TREC- 2005. The basic approach of WIDIT is to combine multiple methods as well as to leverage multiple sources of evidence. Our main strategies for the tracks were: query expansion and fusion optimization for the HARD and Robust tracks; and combination of probabilistic, rule-based, pattern-based, and blacklist email filters for the SPAM track

ScholarsArchive@OSU

Query performance prediction for information retrieval based on covering topic score

Author: Bin Wang
C J Rijsbergen van
Fan Ding
Gareth Jones
Hao Lang
Jin-Tao Li
S Cronen-Townsend
Yi-Xuan Liu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/07/2008
Field of study

We present a statistical method called Covering Topic Score (CTS) to predict query performance for information retrieval. Estimation is based on how well the topic of a user's query is covered by documents retrieved from a certain retrieval system. Our approach is conceptually simple and intuitive, and can be easily extended to incorporate features beyond bag-of-words such as phrases and proximity of terms. Experiments demonstrate that CTS significantly correlates with query performance in a variety of TREC test collections, and in particular CTS gains more prediction power benefiting from features of phrases and proximity of terms. We compare CTS with previous state-of-the-art methods for query performance prediction including clarity score and robustness score. Our experimental results show that CTS consistently performs better than, or at least as well as, these other methods. In addition to its high effectiveness, CTS is also shown to have very low computational complexity, meaning that it can be practical for real applications

Crossref

Irish Universities

DCU Online Research Access Service

Mining document, concept, and term associations for effective biomedical retrieval - Introducing MeSH-enhanced retrieval models

Author: A Shiri
C Plaunt
C Zhai
CD Manning
D Harman
D Trieschnigg
DM Blei
E Meij
Gang Li
J Zhang
Jin Mao
Kun Lu
L Finkelstein
LV Gault
N Griffon
N Stokes
O Kurland
O Vechtomova
P Srinivasan
Q Zeng
QT Zeng
QT Zeng
RD Zielstorff
Rijsbergen van
S Gauch
T Poikonen
W Hersh
Xiangming Mu
Z Lu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Manually assigned subject terms, such as Medical Subject Headings (MeSH) in the health domain, describe the concepts or topics of a document. Existing information retrieval models do not take full advantage of such information. In this paper, we propose two MeSH-enhanced (ME) retrieval models that integrate the concept layer (i.e. MeSH) into the language modeling framework to improve retrieval performance. The new models quantify associations between documents and their assigned concepts to construct conceptual representations for the documents, and mine associations between concepts and terms to construct generative concept models. The two ME models reconstruct two essential estimation processes of the relevance model (Lavrenko and Croft 2001) by incorporating the document-concept and the concept-term associations. More specifically, in Model 1, language models of the pseudo-feedback documents are enriched by their assigned concepts. In Model 2, concepts that are related to users’ queries are first identified, and then used to reweight the pseudo-feedback documents according to the document-concept associations. Experiments carried out on two standard test collections show that the ME models outperformed the query likelihood model, the relevance model (RM3), and an earlier ME model. A detailed case analysis provides insight into how and why the new models improve/worsen retrieval performance. Implications and limitations of the study are discussed. This study provides new ways to formally incorporate semantic annotations, such as subject terms, into retrieval models. The findings of this study suggest that integrating the concept layer into retrieval models can further improve the performance over the current state-of-the-art models.Ye

Crossref

SHAREOK repository

Cheap IR Evaluation: Fewer Topics, No Relevance Judgements, and Crowdsourced Assessments

Author: Roitero Kevin
Publication venue: Universit\ue0 degli Studi di Udine
Publication date: 19/03/2020
Field of study

To evaluate Information Retrieval (IR) effectiveness, a possible approach is to use test collections, which are composed of a collection of documents, a set of description of information needs (called topics), and a set of relevant documents to each topic. Test collections are modelled in a competition scenario: for example, in the well known TREC initiative, participants run their own retrieval systems over a set of topics and they provide a ranked list of retrieved documents; some of the retrieved documents (usually the first ranked) constitute the so called pool, and their relevance is evaluated by human assessors; the document list is then used to compute effectiveness metrics and rank the participant systems. Private Web Search companies also run their in-house evaluation exercises; although the details are mostly unknown, and the aims are somehow different, the overall approach shares several issues with the test collection approach. The aim of this work is to: (i) develop and improve some state-of-the-art work on the evaluation of IR effectiveness while saving resources, and (ii) propose a novel, more principled and engineered, overall approach to test collection based effectiveness evaluation. [...

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università degli Studi di Udine