Search CORE

43,088 research outputs found

Entity Query Feature Expansion Using Knowledge Base Links

Author: Allan James
Dalton Jeffrey
Dietz Laura
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/07/2014
Field of study

Recent advances in automatic entity linking and knowledge base construction have resulted in entity annotations for document and query collections. For example, annotations of entities from large general purpose knowledge bases, such as Freebase and the Google Knowledge Graph. Understanding how to leverage these entity annotations of text to improve ad hoc document retrieval is an open research area. Query expansion is a commonly used technique to improve retrieval effectiveness. Most previous query expansion approaches focus on text, mainly using unigram concepts. In this paper, we propose a new technique, called entity query feature expansion (EQFE) which enriches the query with features from entities and their links to knowledge bases, including structured attributes and text. We experiment using both explicit query entity annotations and latent entities. We evaluate our technique on TREC text collections automatically annotated with knowledge base entity links, including the Google Freebase Annotations (FACC1) data. We find that entity-based feature expansion results in significant improvements in retrieval effectiveness over state-of-the-art text expansion approaches

CiteSeerX

Enlighten

Knowledge-based Query Expansion in Real-Time Microblog Search

Author: Fan Feifan
Lv Chao
Qiang Runwei
Yang Jianwu
Publication venue
Publication date: 13/03/2015
Field of study

Since the length of microblog texts, such as tweets, is strictly limited to 140 characters, traditional Information Retrieval techniques suffer from the vocabulary mismatch problem severely and cannot yield good performance in the context of microblogosphere. To address this critical challenge, in this paper, we propose a new language modeling approach for microblog retrieval by inferring various types of context information. In particular, we expand the query using knowledge terms derived from Freebase so that the expanded one can better reflect users' search intent. Besides, in order to further satisfy users' real-time information need, we incorporate temporal evidences into the expansion method, which can boost recent tweets in the retrieval results with respect to a given topic. Experimental results on two official TREC Twitter corpora demonstrate the significant superiority of our approach over baseline methods.Comment: 9 pages, 9 figure

arXiv.org e-Print Archive

Crossref

University of Twente @ TREC 2009: Indexing half a billion web pages

Author: Hauff Claudia
Hiemstra Djoerd
Publication venue: National Institute of Standards and Technology (NIST)
Publication date: 01/01/2009
Field of study

This report presents results for the TREC 2009 adhoc task, the diversity task, and the relevance feedback task. We present ideas for unsupervised tuning of search system, an approach for spam removal, and the use of categories and query log information for diversifying search results

CiteSeerX

Radboud Repository

University of Twente Research Information

Cross-concordances: terminology mapping and its effectiveness for information retrieval

Author: Mayr Philipp
Petras Vivien
Publication venue
Publication date: 01/01/2008
Field of study

The German Federal Ministry for Education and Research funded a major terminology mapping initiative, which found its conclusion in 2007. The task of this terminology mapping initiative was to organize, create and manage 'cross-concordances' between controlled vocabularies (thesauri, classification systems, subject heading lists) centred around the social sciences but quickly extending to other subject areas. 64 crosswalks with more than 500,000 relations were established. In the final phase of the project, a major evaluation effort to test and measure the effectiveness of the vocabulary mappings in an information system environment was conducted. The paper reports on the cross-concordance work and evaluation results.Comment: 19 pages, 4 figures, 11 tables, IFLA conference 200

arXiv.org e-Print Archive

E-LIS

Finding Related Publications: Extending the Set of Terms Used to Assess Article Similarity.

Author: Demner-Fushman Dina
Hsu Chun-Nan
Kuo Tsung-Ting
Marmor Rebecca
Ohno-Machado Lucila
Singh Siddharth
Wang Shuang
Wei Wei
Publication venue: eScholarship, University of California
Publication date: 01/01/2016
Field of study

Recommendation of related articles is an important feature of the PubMed. The PubMed Related Citations (PRC) algorithm is the engine that enables this feature, and it leverages information on 22 million citations. We analyzed the performance of the PRC algorithm on 4584 annotated articles from the 2005 Text REtrieval Conference (TREC) Genomics Track data. Our analysis indicated that the PRC highest weighted term was not always consistent with the critical term that was most directly related to the topic of the article. We implemented term expansion and found that it was a promising and easy-to-implement approach to improve the performance of the PRC algorithm for the TREC 2005 Genomics data and for the TREC 2014 Clinical Decision Support Track data. For term expansion, we trained a Skip-gram model using the Word2Vec package. This extended PRC algorithm resulted in higher average precision for a large subset of articles. A combination of both algorithms may lead to improved performance in related article recommendations

PubMed Central

eScholarship - University of California