504 research outputs found
A Vertical PRF Architecture for Microblog Search
In microblog retrieval, query expansion can be essential to obtain good
search results due to the short size of queries and posts. Since information in
microblogs is highly dynamic, an up-to-date index coupled with pseudo-relevance
feedback (PRF) with an external corpus has a higher chance of retrieving more
relevant documents and improving ranking. In this paper, we focus on the
research question:how can we reduce the query expansion computational cost
while maintaining the same retrieval precision as standard PRF? Therefore, we
propose to accelerate the query expansion step of pseudo-relevance feedback.
The hypothesis is that using an expansion corpus organized into verticals for
expanding the query, will lead to a more efficient query expansion process and
improved retrieval effectiveness. Thus, the proposed query expansion method
uses a distributed search architecture and resource selection algorithms to
provide an efficient query expansion process. Experiments on the TREC Microblog
datasets show that the proposed approach can match or outperform standard PRF
in MAP and NDCG@30, with a computational cost that is three orders of magnitude
lower.Comment: To appear in ICTIR 201
What Makes a Top-Performing Precision Medicine Search Engine? Tracing Main System Features in a Systematic Way
From 2017 to 2019 the Text REtrieval Conference (TREC) held a challenge task
on precision medicine using documents from medical publications (PubMed) and
clinical trials. Despite lots of performance measurements carried out in these
evaluation campaigns, the scientific community is still pretty unsure about the
impact individual system features and their weights have on the overall system
performance. In order to overcome this explanatory gap, we first determined
optimal feature configurations using the Sequential Model-based Algorithm
Configuration (SMAC) program and applied its output to a BM25-based search
engine. We then ran an ablation study to systematically assess the individual
contributions of relevant system features: BM25 parameters, query type and
weighting schema, query expansion, stop word filtering, and keyword boosting.
For evaluation, we employed the gold standard data from the three TREC-PM
installments to evaluate the effectiveness of different features using the
commonly shared infNDCG metric.Comment: Accepted for SIGIR2020, 10 page
Streamlined Data Fusion: Unleashing the Power of Linear Combination with Minimal Relevance Judgments
Linear combination is a potent data fusion method in information retrieval
tasks, thanks to its ability to adjust weights for diverse scenarios. However,
achieving optimal weight training has traditionally required manual relevance
judgments on a large percentage of documents, a labor-intensive and expensive
process. In this study, we investigate the feasibility of obtaining
near-optimal weights using a mere 20\%-50\% of relevant documents. Through
experiments on four TREC datasets, we find that weights trained with multiple
linear regression using this reduced set closely rival those obtained with
TREC's official "qrels." Our findings unlock the potential for more efficient
and affordable data fusion, empowering researchers and practitioners to reap
its full benefits with significantly less effort.Comment: 12 pages, 8 figure
- …