Search CORE

629 research outputs found

NPRF: A Neural Pseudo Relevance Feedback Framework for Ad-hoc Information Retrieval

Author: He Ben
Hui Kai
Li Canjia
Sun Le
Sun Yingfei
Wang Le
Xu Jungang
Yates Andrew
Publication venue
Publication date: 01/01/2018
Field of study

Pseudo-relevance feedback (PRF) is commonly used to boost the performance of traditional information retrieval (IR) models by using top-ranked documents to identify and weight new query terms, thereby reducing the effect of query-document vocabulary mismatches. While neural retrieval models have recently demonstrated strong results for ad-hoc retrieval, combining them with PRF is not straightforward due to incompatibilities between existing PRF approaches and neural architectures. To bridge this gap, we propose an end-to-end neural PRF framework that can be used with existing neural IR models by embedding different neural models as building blocks. Extensive experiments on two standard test collections confirm the effectiveness of the proposed NPRF framework in improving the performance of two state-of-the-art neural IR models.Comment: Full paper in EMNLP 201

arXiv.org e-Print Archive

Crossref

MPG.PuRe

Finding Related Publications: Extending the Set of Terms Used to Assess Article Similarity.

Author: Demner-Fushman Dina
Hsu Chun-Nan
Kuo Tsung-Ting
Marmor Rebecca
Ohno-Machado Lucila
Singh Siddharth
Wang Shuang
Wei Wei
Publication venue: eScholarship, University of California
Publication date: 01/01/2016
Field of study

Recommendation of related articles is an important feature of the PubMed. The PubMed Related Citations (PRC) algorithm is the engine that enables this feature, and it leverages information on 22 million citations. We analyzed the performance of the PRC algorithm on 4584 annotated articles from the 2005 Text REtrieval Conference (TREC) Genomics Track data. Our analysis indicated that the PRC highest weighted term was not always consistent with the critical term that was most directly related to the topic of the article. We implemented term expansion and found that it was a promising and easy-to-implement approach to improve the performance of the PRC algorithm for the TREC 2005 Genomics data and for the TREC 2014 Clinical Decision Support Track data. For term expansion, we trained a Skip-gram model using the Word2Vec package. This extended PRC algorithm resulted in higher average precision for a large subset of articles. A combination of both algorithms may lead to improved performance in related article recommendations

PubMed Central

eScholarship - University of California

Wikipedia-Based Automatic Diagnosis Prediction in Clinical Decision Support Systems

Author: He Daqing
Li Lei
zhang danchen
Zhao Sanqiang
Publication venue
Publication date: 21/11/2016
Field of study

When making clinical decisions, physicians often consult biomedical literatures for reference. In this case, an effective clinical decision support system, provided with a patient’s health information, should be able to generate accurate queries and return to the physicians with useful articles. Related works in the Clinical Decision Support (CDS) track of TREC 2015 demonstrated the usefulness of knowing patients’ diagnosis information for supporting more effective retrieval, but the diagnosis information is often missing in most cases. Furthermore, it is still a great challenge to perform large-scale automatic diagnosis prediction. This motivates us to propose an automatic diagnosis prediction method to enhance the retrieval in a clinical decision support system, where the evidence for the prediction is extracted from Wikipedia. Through the evaluation conducted on 2014 CDS tasks, our method reaches the best performance among all submitted runs. In the next step, graph structured evidence will be integrated to make the prediction more accurate

D-Scholarship@Pitt

Modeling Temporal Evidence from External Collections

Author: Craveiro Olga
Guo Weiwei
Lin Jimmy
Lin Jimmy
Metzler Donald
O'Connor Brendan
Shokouhi Milad
Xu Tan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 02/12/2018
Field of study

Newsworthy events are broadcast through multiple mediums and prompt the crowds to produce comments on social media. In this paper, we propose to leverage on this behavioral dynamics to estimate the most relevant time periods for an event (i.e., query). Recent advances have shown how to improve the estimation of the temporal relevance of such topics. In this approach, we build on two major novelties. First, we mine temporal evidences from hundreds of external sources into topic-based external collections to improve the robustness of the detection of relevant time periods. Second, we propose a formal retrieval model that generalizes the use of the temporal dimension across different aspects of the retrieval process. In particular, we show that temporal evidence of external collections can be used to (i) infer a topic's temporal relevance, (ii) select the query expansion terms, and (iii) re-rank the final results for improved precision. Experiments with TREC Microblog collections show that the proposed time-aware retrieval model makes an effective and extensive use of the temporal dimension to improve search results over the most recent temporal models. Interestingly, we observe a strong correlation between precision and the temporal distribution of retrieved and relevant documents.Comment: To appear in WSDM 201

arXiv.org e-Print Archive

Crossref

What Makes a Top-Performing Precision Medicine Search Engine? Tracing Main System Features in a Systematic Way

Author: Bergstra James S.
Bergstra James S.
David
Eggensperger Katharina
Faessler Erik
Falkner Stefan
Golovin Daniel
Hersh William R.
Hersh William R.
Hutter Frank
Kelly Liadh
Li Lisha
López-García Pablo
Oleynik Michel
Roberts Kirk
Roberts Kirk
Roberts Kirk
Roberts Kirk
Roberts Kirk
Sievert Scott
Simpson Matthew S.
Snoek Jasper
Stephen
Stokes Nicola
Taylor Michael
Yilmaz Emine
Zhou Xuesi
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 05/06/2020
Field of study

From 2017 to 2019 the Text REtrieval Conference (TREC) held a challenge task on precision medicine using documents from medical publications (PubMed) and clinical trials. Despite lots of performance measurements carried out in these evaluation campaigns, the scientific community is still pretty unsure about the impact individual system features and their weights have on the overall system performance. In order to overcome this explanatory gap, we first determined optimal feature configurations using the Sequential Model-based Algorithm Configuration (SMAC) program and applied its output to a BM25-based search engine. We then ran an ablation study to systematically assess the individual contributions of relevant system features: BM25 parameters, query type and weighting schema, query expansion, stop word filtering, and keyword boosting. For evaluation, we employed the gold standard data from the three TREC-PM installments to evaluate the effectiveness of different features using the commonly shared infNDCG metric.Comment: Accepted for SIGIR2020, 10 page

arXiv.org e-Print Archive

Crossref

Enhancing Clinical Decision Support Systems with Public Knowledge Bases

Author: He Daqing
Zhang Danchen
Publication venue
Publication date
Field of study

With vast amount of biomedical literature available online, doctors have the benefits of consulting the literature before making clinical decisions, but they are facing the daunting task of finding needles in haystacks. In this situation, it would help doctors if an effective clinical decision support system could generate accurate queries and return a manageable size of highly useful articles. Existing studies showed the useful-ness of patients’ diagnosis information in such scenario, but diagnosis is often missing in most cases. Furthermore, existing diagnosis prediction systems mainly focus on predicting a small range of diseases with well-formatted features, and it is still a great challenge to perform large-scale automatic diagnosis predictions based on noisy pa-tient medical records. In this paper, we propose automatic diagnosis prediction meth-ods for enhancing the retrieval in a clinical decision support system, where the predic-tion is based on evidences automatically collected from publicly accessible online knowledge bases such as Wikipedia and Semantic MEDLINE Database (SemMedDB). The assumption is that relevant diseases and their corresponding symptoms co-occur more frequently in these knowledge bases. Our methods perfor-mance was evaluated using test collections from the Clinical Decision Support (CDS) track in TREC 2014, 2015 and 2016. The results show that our best method can au-tomatically predict diagnosis with about 65.56% usefulness, and such predictions can significantly improve the biomedical literatures retrieval. Our methods can generate comparable retrieval results to the state-of-art methods, which utilize much more complicated methods and some manually crafted medical knowledge. One possible future work is to apply these methods in collaboration with real doctors

D-Scholarship@Pitt