Search CORE

6 research outputs found

Improving biomedical information retrieval by linear combinations of different query expansion techniques

Author: Ahmed AbdoAziz Ahmed Abdulla
Bo Xu
Hongfei Lin
Santosh Kumar Banbhrani
Publication venue: Springer Nature
Publication date: 01/01/2016
Field of study

Springer - Publisher Connector

A robust approach to optimizing multi-source information for enhancing genomics retrieval performance

Author: Hu Qinmin
Huang Jimmy Xiangji
Miao Jun
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Improving biomedical information retrieval by linear combinations of different query expansion techniques

Author: Ahmed AbdoAziz Ahmed Abdulla
AR Aronson
AR Rivas
Bo Xu
G Salton
G Salton
Hongfei Lin
JR Perez-Aguera
K Sparck Jones
MA Otair
RN Jerome
Santosh Kumar Banbhrani
Y Lv
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Search beyond traditional probabilistic information retrieval

Author: Hu Qinmin
Publication venue
Publication date: 01/07/2009
Field of study

"This thesis focuses on search beyond probabilistic information retrieval. Three ap- proached are proposed beyond the traditional probabilistic modelling. First, term associ- ation is deeply examined. Term association considers the term dependency using a factor analysis based model, instead of treating each term independently. Latent factors, con- sidered the same as the hidden variables of ""eliteness"" introduced by Robertson et al. to gain understanding of the relation among term occurrences and relevance, are measured by the dependencies and occurrences of term sequences and subsequences. Second, an entity-based ranking approach is proposed in an entity system named ""EntityCube"" which has been released by Microsoft for public use. A summarization page is given to summarize the entity information over multiple documents such that the truly relevant entities can be highly possibly searched from multiple documents through integrating the local relevance contributed by proximity and the global enhancer by topic model. Third, multi-source fusion sets up a meta-search engine to combine the ""knowledge"" from different sources. Meta-features, distilled as high-level categories, are deployed to diversify the baselines. Three modified fusion methods are employed, which are re- ciprocal, CombMNZ and CombSUM with three expanded versions. Through extensive experiments on the standard large-scale TREC Genomics data sets, the TREC HARD data sets and the Microsoft EntityCube Web collections, the proposed extended models beyond probabilistic information retrieval show their effectiveness and superiority.

YorkSpace

Making a Better Query: Find Good Feedback Documents and Terms via Semantic Associations

Author: Miao Jun
Publication venue
Publication date: 27/07/2017
Field of study

When people search, they always input several keywords as an input query. While current information retrieval (IR) systems are based on term matching, documents will not be considered as relevant if they do not have the exact terms as in the query. However, it is common that these documents are relevant if they contain terms semantically similar to the query. To retrieve these documents, a classic way is to expand the original query with more related terms. Pseudo relevance feedback (PRF) has proven to be effective to expand origin queries and improve the performance of IR. It assumes the top k ranked documents obtained through the first round retrieval are relevant as feedback documents, and expand the original queries with feedback terms selected from these feedback documents. However, applying PRF for query expansion must be very carefully. Wrongly added terms can bring noisy information and hurt the overall search experiences extensively. The assumption of feedback documents is too strong to be completely true. To avoid noise import and make significant improvements simultaneously, we solve the significant problem through four ways in this dissertation. Firstly, we assume the proximity information among terms as term semantic associations and utilize them to seek new relevant terms. Next, to obtain good and robust performance for PRF via adapting topic information, we propose a new concept named topic space and present three models based on it. Topics obtained through topic modeling do help identify how relevant a feedback document is. Weights of candidate terms in these more relevant feedback documents will be boosted and have higher probabilities to be chosen. Furthermore, we apply machine learning methods to classify which feedback documents are effective for PRF. To solve the problem of lack-of-training-data for the application of machine learning methods in PRF, we improve a traditional co-training method and take the quality of classifiers into account. Finally, we present a new probabilistic framework to integrate existing effective methods like semantic associations as components for further research. All the work has been tested on public datasets and proven to be effective and efficient

YorkSpace

Exploring a multi-source fusion approach for genomics information retrieval

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref