Search CORE

1,714 research outputs found

Modeling and mining term association for improving biomedical information retrieval performance

Author: Hu Qinmin
Hu Xiaohua
Huang Jimmy Xiangji
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Search beyond traditional probabilistic information retrieval

Author: Hu Qinmin
Publication venue
Publication date: 01/07/2009
Field of study

"This thesis focuses on search beyond probabilistic information retrieval. Three ap- proached are proposed beyond the traditional probabilistic modelling. First, term associ- ation is deeply examined. Term association considers the term dependency using a factor analysis based model, instead of treating each term independently. Latent factors, con- sidered the same as the hidden variables of ""eliteness"" introduced by Robertson et al. to gain understanding of the relation among term occurrences and relevance, are measured by the dependencies and occurrences of term sequences and subsequences. Second, an entity-based ranking approach is proposed in an entity system named ""EntityCube"" which has been released by Microsoft for public use. A summarization page is given to summarize the entity information over multiple documents such that the truly relevant entities can be highly possibly searched from multiple documents through integrating the local relevance contributed by proximity and the global enhancer by topic model. Third, multi-source fusion sets up a meta-search engine to combine the ""knowledge"" from different sources. Meta-features, distilled as high-level categories, are deployed to diversify the baselines. Three modified fusion methods are employed, which are re- ciprocal, CombMNZ and CombSUM with three expanded versions. Through extensive experiments on the standard large-scale TREC Genomics data sets, the TREC HARD data sets and the Microsoft EntityCube Web collections, the proposed extended models beyond probabilistic information retrieval show their effectiveness and superiority.

YorkSpace

Using Learning to Rank Approach to Promoting Diversity for Biomedical Information Retrieval with Wikipedia

Author: Wu Jiajin
Publication venue
Publication date: 28/07/2014
Field of study

In most of the traditional information retrieval (IR) models, the independent relevance assumption is taken, which assumes the relevance of a document is independent of other documents. However, the pitfall of this is the high redundancy and low diversity of retrieval result. This has been seen in many scenarios, especially in biomedical IR, where the information need of one query may refer to different aspects. Promoting diversity in IR takes the relationship between documents into account. Unlike previous studies, we tackle this problem in the learning to rank perspective. The main challenges are how to find salient features for biomedical data and how to integrate dynamic features into the ranking model. To address these challenges, Wikipedia is used to detect topics of documents for generating diversity biased features. A combined model is proposed and studied to learn a diversified ranking result. Experiment results show the proposed method outperforms baseline models

YorkSpace

A robust approach to optimizing multi-source information for enhancing genomics retrieval performance

Author: Hu Qinmin
Huang Jimmy Xiangji
Miao Jun
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Adapting a relation extraction pipeline for the BioCreAtIvE II task

Author: Grover Claire
Haddow Barry
Klein Ewan
Matthews Michael
Nielsen Leif Arda
Tobin Richard
Wang Xinglong
Publication venue
Publication date: 01/01/2007
Field of study

Edinburgh Research Explorer

Artificial Sequences and Complexity Measures

In this paper we exploit concepts of information theory to address the fundamental problem of identifying and defining the most suitable tools to extract, in a automatic and agnostic way, information from a generic string of characters. We introduce in particular a class of methods which use in a crucial way data compression techniques in order to define a measure of remoteness and distance between pairs of sequences of characters (e.g. texts) based on their relative information content. We also discuss in detail how specific features of data compression techniques could be used to introduce the notion of dictionary of a given sequence and of Artificial Text and we show how these new tools can be used for information extraction purposes. We point out the versatility and generality of our method that applies to any kind of corpora of character strings independently of the type of coding behind them. We consider as a case study linguistic motivated problems and we present results for automatic language recognition, authorship attribution and self consistent-classification.Comment: Revised version, with major changes, of previous "Data Compression approach to Information Extraction and Classification" by A. Baronchelli and V. Loreto. 15 pages; 5 figure

arXiv.org e-Print Archive

City Research Online

Crossref

Archivio della ricerca- Università di Roma La Sapienza

Text Mining for Systems Biology and MetNet

Author: Zhang Lifeng
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2010
Field of study

The rapidly expanding volume of biological and biomedical literature motivates demand for more friendly access. Better automated mining of this literature can help find useful and desired citations and can extract new knowledge from the massive biological literaturome. The research objectives presented here, when met, will provide comprehensive text mining utilities within the MetNet (Metabolic Network Exchange) (Wurtele et al., 2007), platform to help biologists visualize, explore, and analyze the biological literaturome. The overarching research question to be addressed is how to automatically extract biomolecular interactions from numerous biomedical texts. Here are the specific aims of this work. 1. Research on the text empirics of interaction-indicating terms to find more clues to improve the current algorithm applied in PathBinder to more precisely judge whether biomolecular interaction descriptions are present in sentences from the biological literature. 2. Based on these research results, extract interacting biomolecule pairs from literature and use those pairs to construct a biomolecule interaction database and network. 3. Integrate biomolecular interaction-indicating term extraction into MetNet\u27s existing metabolomic network database. 4. Apply all of the above results in PathBinder software. 5. Quantitatively evaluate the success of algorithms developed based on the text empirics results. This work is expected to advance systems biology by answering scientific questions about biological text empirics, by contributing to the engineering task of building MetNet and key constituent subsystems of MetNet, and by supporting the MetNet project through selected maintenance tasks

Digital Repository @ Iowa State University (ISU)

Proceedings of the 9th Dutch-Belgian Information Retrieval Workshop

Author: den Hamer Ida
Publication venue: Centre for Telematics and Information Technology (CTIT)
Publication date: 01/02/2009
Field of study

University of Twente Research Information

Semantic Approaches for Knowledge Discovery and Retrieval in Biomedicine

Author: Wilkowski Bartlomiej
Publication venue: Technical University of Denmark
Publication date: 01/01/2011
Field of study

Online Research Database In Technology