Search CORE

7,630 research outputs found

Two selfless contributions to web search evaluation

Author: Aly Robin
Hiemstra Djoerd
Publication venue: National Institute of Standards and Technology
Publication date: 01/01/2014
Field of study

We present our results for the Web Search track and the Federated Web Search track for the 23rd Text Retrieval Conference TREC

Radboud Repository

University of Twente Research Information

Overview of the TREC 2022 NeuCLIR Track

Author: Lawrie Dawn
MacAvaney Sean
Mayfield James
McNamee Paul
Oard Douglas W.
Soldaini Luca
Yang Eugene
Publication venue
Publication date: 24/09/2023
Field of study

This is the first year of the TREC Neural CLIR (NeuCLIR) track, which aims to study the impact of neural approaches to cross-language information retrieval. The main task in this year's track was ad hoc ranked retrieval of Chinese, Persian, or Russian newswire documents using queries expressed in English. Topics were developed using standard TREC processes, except that topics developed by an annotator for one language were assessed by a different annotator when evaluating that topic on a different language. There were 172 total runs submitted by twelve teams.Comment: 22 pages, 13 figures, 10 tables. Part of the Thirty-First Text REtrieval Conference (TREC 2022) Proceedings. Replace the misplaced Russian result tabl

arXiv.org e-Print Archive

Finding Related Publications: Extending the Set of Terms Used to Assess Article Similarity.

Author: Demner-Fushman Dina
Hsu Chun-Nan
Kuo Tsung-Ting
Marmor Rebecca
Ohno-Machado Lucila
Singh Siddharth
Wang Shuang
Wei Wei
Publication venue: eScholarship, University of California
Publication date: 01/01/2016
Field of study

Recommendation of related articles is an important feature of the PubMed. The PubMed Related Citations (PRC) algorithm is the engine that enables this feature, and it leverages information on 22 million citations. We analyzed the performance of the PRC algorithm on 4584 annotated articles from the 2005 Text REtrieval Conference (TREC) Genomics Track data. Our analysis indicated that the PRC highest weighted term was not always consistent with the critical term that was most directly related to the topic of the article. We implemented term expansion and found that it was a promising and easy-to-implement approach to improve the performance of the PRC algorithm for the TREC 2005 Genomics data and for the TREC 2014 Clinical Decision Support Track data. For term expansion, we trained a Skip-gram model using the Word2Vec package. This extended PRC algorithm resulted in higher average precision for a large subset of articles. A combination of both algorithms may lead to improved performance in related article recommendations

PubMed Central

eScholarship - University of California

Development of Arabic Information Retrieval Systems in the 21st Century

Author: Elmekawi Awatif
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 01/03/2018
Field of study

The present study deals with the development of Arabic Information Retrieval Systems starting from 2000, its vital role in the Text Retrieval Conference (TREC), and in the cross-language information retrieval track. It has overviewed the developments concerning the Holy Qur'an, Arabic language, terms relevant to Arabic information retrieval systems, and the characteristics of the Arabic language compared with other languages since the early 21st century. These developments include rich resources of up to date information so as to develop research in this area, modern developments in assessing and measuring Arabic information retrieval systems, relevant theses, and some research studies of contemporary universities on the use of TREC in Arabic information retrieval, and the researchers with no prior knowledge of Arabic language. The study ends with some studies of the Arab universities. Keywords: Retrieval Systems, Arabic Information, Twenty- first centur

International Institute for Science, Technology and Education (IISTE): E-Journals

An Exploration Study of Mixed-initiative Query Reformulation in Conversational Passage Retrieval

Author: Fang Hui
Yang Dayu
Zhang Yue
Publication venue
Publication date: 17/07/2023
Field of study

In this paper, we report our methods and experiments for the TREC Conversational Assistance Track (CAsT) 2022. In this work, we aim to reproduce multi-stage retrieval pipelines and explore one of the potential benefits of involving mixed-initiative interaction in conversational passage retrieval scenarios: reformulating raw queries. Before the first ranking stage of a multi-stage retrieval pipeline, we propose a mixed-initiative query reformulation module, which achieves query reformulation based on the mixed-initiative interaction between the users and the system, as the replacement for the neural reformulation method. Specifically, we design an algorithm to generate appropriate questions related to the ambiguities in raw queries, and another algorithm to reformulate raw queries by parsing users' feedback and incorporating it into the raw query. For the first ranking stage of our multi-stage pipelines, we adopt a sparse ranking function: BM25, and a dense retrieval method: TCT-ColBERT. For the second-ranking step, we adopt a pointwise reranker: MonoT5, and a pairwise reranker: DuoT5. Experiments on both TREC CAsT 2021 and TREC CAsT 2022 datasets show the effectiveness of our mixed-initiative-based query reformulation method on improving retrieval performance compared with two popular reformulators: a neural reformulator: CANARD-T5 and a rule-based reformulator: historical query reformulator(HQE).Comment: The Thirty-First Text REtrieval Conference (TREC 2022) Proceeding

arXiv.org e-Print Archive

Enhancing access to the Bibliome: the TREC 2004 Genomics Track

Author: Aaron M Cohen
Dale F Kraemer
Laura Ross
Phoebe Roberts
Phoebe Roberts
Ravi Teja Bhupatiraju
William R Hersh
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: The goal of the TREC Genomics Track is to improve information retrieval in the area of genomics by creating test collections that will allow researchers to improve and better understand failures of their systems. The 2004 track included an ad hoc retrieval task, simulating use of a search engine to obtain documents about biomedical topics. This paper describes the Genomics Track of the Text Retrieval Conference (TREC) 2004, a forum for evaluation of IR research systems, where retrieval in the genomics domain has recently begun to be assessed. RESULTS: A total of 27 research groups submitted 47 different runs. The most effective runs, as measured by the primary evaluation measure of mean average precision (MAP), used a combination of domain-specific and general techniques. The best MAP obtained by any run was 0.4075. Techniques that expanded queries with gene name lists as well as words from related articles had the best efficacy. However, many runs performed more poorly than a simple baseline run, indicating that careful selection of system features is essential. CONCLUSION: Various approaches to ad hoc retrieval provide a diversity of efficacy. The TREC Genomics Track and its test collection resources provide tools that allow improvement in information retrieval systems

CiteSeerX

Springer - Publisher Connector

PubMed Central

Objective and automated protocols for the evaluation of biomedical search engines using No Title Evaluation protocols

Author: AM Cohen
D Demner-Fushman
E Amitay
EM Voorhees
Fabien Campagne
I Soboroff
JA Aslam
K Sparck Jones
K Sparck Jones
KC Dorff
M Fuller
P Boldi
P Dong
R Nuray
S Buttcher
SE Robertson
SF Kim
Y Yue
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background The evaluation of information retrieval techniques has traditionally relied on human judges to determine which documents are relevant to a query and which are not. This protocol is used in the Text Retrieval Evaluation Conference (TREC), organized annually for the past 15 years, to support the unbiased evaluation of novel information retrieval approaches. The TREC Genomics Track has recently been introduced to measure the performance of information retrieval for biomedical applications. Results We describe two protocols for evaluating biomedical information retrieval techniques without human relevance judgments. We call these protocols No Title Evaluation (NT Evaluation). The first protocol measures performance for focused searches, where only one relevant document exists for each query. The second protocol measures performance for queries expected to have potentially many relevant documents per query (high-recall searches). Both protocols take advantage of the clear separation of titles and abstracts found in Medline. We compare the performance obtained with these evaluation protocols to results obtained by reusing the relevance judgments produced in the 2004 and 2005 TREC Genomics Track and observe significant correlations between performance rankings generated by our approach and TREC. Spearman's correlation coefficients in the range of 0.79–0.92 are observed comparing bpref measured with NT Evaluation or with TREC evaluations. For comparison, coefficients in the range 0.86–0.94 can be observed when evaluating the same set of methods with data from two independent TREC Genomics Track evaluations. We discuss the advantages of NT Evaluation over the TRels and the data fusion evaluation protocols introduced recently. Conclusion Our results suggest that the NT Evaluation protocols described here could be used to optimize some search engine parameters before human evaluation. Further research is needed to determine if NT Evaluation or variants of these protocols can fully substitute for human evaluations.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Ontology-Based MEDLINE Document Classification

Author: J.T. Eppig
M.E. Funk
R. Rada
T. Joachims
T. Joachims
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

An increasing and overwhelming amount of biomedical information is available in the research literature mainly in the form of free-text. Biologists need tools that automate their information search and deal with the high volume and ambiguity of free-text. Ontologies can help automatic information processing by providing standard concepts and information about the relationships between concepts. The Medical Subject Headings (MeSH) ontology is already available and used by MEDLINE indexers to annotate the conceptual content of biomedical articles. This paper presents a domain-independent method that uses the MeSH ontology inter-concept relationships to extend the existing MeSH-based representation of MEDLINE documents. The extension method is evaluated within a document triage task organized by the Genomics track of the 2005 Text REtrieval Conference (TREC). Our method for extending the representation of documents leads to an improvement of 17% over a non-extended baseline in terms of normalized utility, the metric defined for the task. The SVMlight software is used to classify documents

CiteSeerX

Crossref

Irish Universities

DCU Online Research Access Service

Improving relevance feedback-based query expansion by the use of a weighted word pairs approach

Author: COLACE Francesco
DE SANTO Massimo
GRECO LUCA
NAPOLETANO PAOLO
Publication venue: 'Wiley'
Publication date: 01/01/2015
Field of study

In this article, the use of a new term extraction method for query expansion (QE) in text retrieval is investigated. The new method expands the initial query with a structured representation made of weighted word pairs (WWP) extracted from a set of training documents (relevance feedback). Standard text retrieval systems can handle a WWP structure through custom Boolean weighted models. We experimented with both the explicit and pseudorelevance feedback schemas and compared the proposed term extraction method with others in the literature, such as KLD and RM3. Evaluations have been conducted on a number of test collections (Text REtrivel Conference [TREC]-6, -7, -8, -9, and -10). Results demonstrated that the QE method based on this new structure outperforms the baseline

CiteSeerX

Archivio della Ricerca - Università di Salerno