21,471 research outputs found

    Merging Multiple Search Results Approach for Meta-Search Engines

    Get PDF
    Meta Search Engines are finding tools developed for enhancing the search performance by submitting user queries to multiple searchengines and combining the search results in a unified ranked list. They utilized data fusion technique, which requires three major steps: databases selection, the results combination, and the results merging. This study tries to build a framework that can be used for merging the search results retrieved from any set of search engines. This framework based on answering three major questions:1.How meta-search developers could define the optimal rank order for the selected engines.2. How meta-search developers could choose the best search engines combination.3.What is the optimal heuristic merging function that could be used for aggregating the rank order of the retrieved documents form incomparable search engines.The main data collection process depends onrunning 40 general queries on three major search engines (Google, AltaVista, and Alltheweb). Real users have involved in the relevance judgment process for a five point relevancy scale. Theperformance of the three search engines, their different combinations and different merging algorithm have been compared to rank the database, choose the best combination and define the optimal merging function.The major findings of this study are (1) Ranking the databases in merging process should depends on their overall performance not their popularity or size; (2)Larger databases tend to perform better than smaller databases; (3)The combination of the search engines should depend on ranking the database and choosing theappropriate combination function; (4)Search Engines tend to retrieve more overlap relevant document than overlap irrelevant documents; and (5) The merging function which take theoverlapped documents into accounts tend to perform better than the interleave and the rank similarity function.In addition to these findings the study has developed a set of requirements for the merging process to be successful. This procedure include the databases selection, the combination, and merging upon heuristic solutions

    The impact of sequence database choice on metaproteomic results in gut microbiota studies

    Get PDF
    Background: Elucidating the role of gut microbiota in physiological and pathological processes has recently emerged as a key research aim in life sciences. In this respect, metaproteomics, the study of the whole protein complement of a microbial community, can provide a unique contribution by revealing which functions are actually being expressed by specific microbial taxa. However, its wide application to gut microbiota research has been hindered by challenges in data analysis, especially related to the choice of the proper sequence databases for protein identification. Results: Here, we present a systematic investigation of variables concerning database construction and annotation and evaluate their impact on human and mouse gut metaproteomic results. We found that both publicly available and experimental metagenomic databases lead to the identification of unique peptide assortments, suggesting parallel database searches as a mean to gain more complete information. In particular, the contribution of experimental metagenomic databases was revealed to be mandatory when dealing with mouse samples. Moreover, the use of a "merged" database, containing all metagenomic sequences from the population under study, was found to be generally preferable over the use of sample-matched databases. We also observed that taxonomic and functional results are strongly database-dependent, in particular when analyzing the mouse gut microbiota. As a striking example, the Firmicutes/Bacteroidetes ratio varied up to tenfold depending on the database used. Finally, assembling reads into longer contigs provided significant advantages in terms of functional annotation yields. Conclusions: This study contributes to identify host- and database-specific biases which need to be taken into account in a metaproteomic experiment, providing meaningful insights on how to design gut microbiota studies and to perform metaproteomic data analysis. In particular, the use of multiple databases and annotation tools has to be encouraged, even though this requires appropriate bioinformatic resources

    Distributed Information Retrieval using Keyword Auctions

    Get PDF
    This report motivates the need for large-scale distributed approaches to information retrieval, and proposes solutions based on keyword auctions
    • …
    corecore