Search CORE

504 research outputs found

A Vertical PRF Architecture for Microblog Search

Author: Arguello J.
Demeester Thomas
Lin Jimmy
Massoudi Kamran
Milad Shokouhi
Rosa Kevin Dela
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 08/10/2018
Field of study

In microblog retrieval, query expansion can be essential to obtain good search results due to the short size of queries and posts. Since information in microblogs is highly dynamic, an up-to-date index coupled with pseudo-relevance feedback (PRF) with an external corpus has a higher chance of retrieving more relevant documents and improving ranking. In this paper, we focus on the research question:how can we reduce the query expansion computational cost while maintaining the same retrieval precision as standard PRF? Therefore, we propose to accelerate the query expansion step of pseudo-relevance feedback. The hypothesis is that using an expansion corpus organized into verticals for expanding the query, will lead to a more efficient query expansion process and improved retrieval effectiveness. Thus, the proposed query expansion method uses a distributed search architecture and resource selection algorithms to provide an efficient query expansion process. Experiments on the TREC Microblog datasets show that the proposed approach can match or outperform standard PRF in MAP and NDCG@30, with a computational cost that is three orders of magnitude lower.Comment: To appear in ICTIR 201

arXiv.org e-Print Archive

Crossref

What Makes a Top-Performing Precision Medicine Search Engine? Tracing Main System Features in a Systematic Way

Author: Bergstra James S.
Bergstra James S.
David
Eggensperger Katharina
Faessler Erik
Falkner Stefan
Golovin Daniel
Hersh William R.
Hersh William R.
Hutter Frank
Kelly Liadh
Li Lisha
López-García Pablo
Oleynik Michel
Roberts Kirk
Roberts Kirk
Roberts Kirk
Roberts Kirk
Roberts Kirk
Sievert Scott
Simpson Matthew S.
Snoek Jasper
Stephen
Stokes Nicola
Taylor Michael
Yilmaz Emine
Zhou Xuesi
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 05/06/2020
Field of study

From 2017 to 2019 the Text REtrieval Conference (TREC) held a challenge task on precision medicine using documents from medical publications (PubMed) and clinical trials. Despite lots of performance measurements carried out in these evaluation campaigns, the scientific community is still pretty unsure about the impact individual system features and their weights have on the overall system performance. In order to overcome this explanatory gap, we first determined optimal feature configurations using the Sequential Model-based Algorithm Configuration (SMAC) program and applied its output to a BM25-based search engine. We then ran an ablation study to systematically assess the individual contributions of relevant system features: BM25 parameters, query type and weighting schema, query expansion, stop word filtering, and keyword boosting. For evaluation, we employed the gold standard data from the three TREC-PM installments to evaluate the effectiveness of different features using the commonly shared infNDCG metric.Comment: Accepted for SIGIR2020, 10 page

arXiv.org e-Print Archive

Crossref

The University of Padua IMS Research Group at TREC 2018 Precision Medicine Track

Author: AGOSTI MARISTELLA
DI NUNZIO GIORGIO MARIA
MARCHESIN STEFANO
Publication venue
Publication date: 01/01/2018
Field of study

Archivio istituzionale della ricerca - Università di Padova

Clustering-based fusion for medical information retrieval

Author: Huang Yidong
Nugent Chris
Wu Shengli
Xu Qiuyu
Publication venue
Publication date: 30/11/2022
Field of study

Ulster University's Research Portal

Streamlined Data Fusion: Unleashing the Power of Linear Combination with Minimal Relevance Judgments

Author: Huang Yidong
Moore Adrian
Wu Shengli
Xu Qiuyu
Publication venue
Publication date: 21/09/2023
Field of study

Linear combination is a potent data fusion method in information retrieval tasks, thanks to its ability to adjust weights for diverse scenarios. However, achieving optimal weight training has traditionally required manual relevance judgments on a large percentage of documents, a labor-intensive and expensive process. In this study, we investigate the feasibility of obtaining near-optimal weights using a mere 20\%-50\% of relevant documents. Through experiments on four TREC datasets, we find that weights trained with multiple linear regression using this reduced set closely rival those obtained with TREC's official "qrels." Our findings unlock the potential for more efficient and affordable data fusion, empowering researchers and practitioners to reap its full benefits with significantly less effort.Comment: 12 pages, 8 figure

arXiv.org e-Print Archive

Effective collection construction for information retrieval evaluation and optimization

Author: Li D.
Publication venue
Publication date: 01/01/2020
Field of study

International Migration, Integration and Social Cohesion online publications