47 research outputs found

    Quantifying and filtering knowledge generated by literature based discovery

    Get PDF
    Background Literature based discovery (LBD) automatically infers missed connections between concepts in literature. It is often assumed that LBD generates more information than can be reasonably examined. Methods We present a detailed analysis of the quantity of hidden knowledge produced by an LBD system and the effect of various filtering approaches upon this. The investigation of filtering combined with single or multi-step linking term chains is carried out on all articles in PubMed. Results The evaluation is carried out using both replication of existing discoveries, which provides justification for multi-step linking chain knowledge in specific cases, and using timeslicing, which gives a large scale measure of performance. Conclusions While the quantity of hidden knowledge generated by LBD can be vast, we demonstrate that (a) intelligent filtering can greatly reduce the number of hidden knowledge pairs generated, (b) for a specific term, the number of single step connections can be manageable, and (c) in the absence of single step hidden links, considering multiple steps can provide valid links

    Discovering context-specific relationships from biological literature by using multi-level context terms

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The Swanson's ABC model is powerful to infer hidden relationships buried in biological literature. However, the model is inadequate to infer relations with context information. In addition, the model generates a very large amount of candidates from biological text, and it is a semi-automatic, labor-intensive technique requiring human expert's manual input. To tackle these problems, we incorporate context terms to infer relations between AB interactions and BC interactions.</p> <p>Methods</p> <p>We propose 3 steps to discover meaningful hidden relationships between drugs and diseases: 1) multi-level (gene, drug, disease, symptom) entity recognition, 2) interaction extraction (drug-gene, gene-disease) from literature, 3) context vector based similarity score calculation. Subsequently, we evaluate our hypothesis with the datasets of the "Alzheimer's disease" related 77,711 PubMed abstracts. As golden standards, PharmGKB and CTD databases are used. Evaluation is conducted in 2 ways: first, comparing precision of the proposed method and the previous method and second, analysing top 10 ranked results to examine whether highly ranked interactions are truly meaningful or not.</p> <p>Results</p> <p>The results indicate that context-based relation inference achieved better precision than the previous ABC model approach. The literature analysis also shows that interactions inferred by the context-based approach are more meaningful than interactions by the previous ABC model.</p> <p>Conclusions</p> <p>We propose a novel interaction inference technique that incorporates context term vectors into the ABC model to discover meaningful hidden relationships. By utilizing multi-level context terms, our model shows better performance than the previous ABC model.</p

    Using Noun Phrases for Navigating Biomedical Literature on Pubmed: How Many Updates Are We Losing Track of?

    Get PDF
    Author-supplied citations are a fraction of the related literature for a paper. The “related citations” on PubMed is typically dozens or hundreds of results long, and does not offer hints why these results are related. Using noun phrases derived from the sentences of the paper, we show it is possible to more transparently navigate to PubMed updates through search terms that can associate a paper with its citations. The algorithm to generate these search terms involved automatically extracting noun phrases from the paper using natural language processing tools, and ranking them by the number of occurrences in the paper compared to the number of occurrences on the web. We define search queries having at least one instance of overlap between the author-supplied citations of the paper and the top 20 search results as citation validated (CV). When the overlapping citations were written by same authors as the paper itself, we define it as CV-S and different authors is defined as CV-D. For a systematic sample of 883 papers on PubMed Central, at least one of the search terms for 86% of the papers is CV-D versus 65% for the top 20 PubMed “related citations.” We hypothesize these quantities computed for the 20 million papers on PubMed to differ within 5% of these percentages. Averaged across all 883 papers, 5 search terms are CV-D, and 10 search terms are CV-S, and 6 unique citations validate these searches. Potentially related literature uncovered by citation-validated searches (either CV-S or CV-D) are on the order of ten per paper – many more if the remaining searches that are not citation-validated are taken into account. The significance and relationship of each search result to the paper can only be vetted and explained by a researcher with knowledge of or interest in that paper

    Prevalence of Influenza A viruses in wild migratory birds in Alaska: Patterns of variation in detection at a crossroads of intercontinental flyways

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The global spread of the highly pathogenic avian influenza H5N1 virus has stimulated interest in a better understanding of the mechanisms of H5N1 dispersal, including the potential role of migratory birds as carriers. Although wild birds have been found dead during H5N1 outbreaks, evidence suggests that others have survived natural infections, and recent studies have shown several species of ducks capable of surviving experimental inoculations of H5N1 and shedding virus. To investigate the possibility of migratory birds as a means of H5N1 dispersal into North America, we monitored for the virus in a surveillance program based on the risk that wild birds may carry the virus from Asia.</p> <p>Results</p> <p>Of 16,797 birds sampled in Alaska between May 2006 and March 2007, low pathogenic avian influenza viruses were detected in 1.7% by rRT-PCR but no highly pathogenic viruses were found. Our data suggest that prevalence varied among sampling locations, species (highest in waterfowl, lowest in passerines), ages (juveniles higher than adults), sexes (males higher than females), date (highest in autumn), and analytical technique (rRT-PCR prevalence = 1.7%; virus isolation prevalence = 1.5%).</p> <p>Conclusion</p> <p>The prevalence of low pathogenic avian influenza viruses isolated from wild birds depends on biological, temporal, and geographical factors, as well as testing methods. Future studies should control for, or sample across, these sources of variation to allow direct comparison of prevalence rates.</p

    Literature Mining for the Discovery of Hidden Connections between Drugs, Genes and Diseases

    Get PDF
    The scientific literature represents a rich source for retrieval of knowledge on associations between biomedical concepts such as genes, diseases and cellular processes. A commonly used method to establish relationships between biomedical concepts from literature is co-occurrence. Apart from its use in knowledge retrieval, the co-occurrence method is also well-suited to discover new, hidden relationships between biomedical concepts following a simple ABC-principle, in which A and C have no direct relationship, but are connected via shared B-intermediates. In this paper we describe CoPub Discovery, a tool that mines the literature for new relationships between biomedical concepts. Statistical analysis using ROC curves showed that CoPub Discovery performed well over a wide range of settings and keyword thesauri. We subsequently used CoPub Discovery to search for new relationships between genes, drugs, pathways and diseases. Several of the newly found relationships were validated using independent literature sources. In addition, new predicted relationships between compounds and cell proliferation were validated and confirmed experimentally in an in vitro cell proliferation assay. The results show that CoPub Discovery is able to identify novel associations between genes, drugs, pathways and diseases that have a high probability of being biologically valid. This makes CoPub Discovery a useful tool to unravel the mechanisms behind disease, to find novel drug targets, or to find novel applications for existing drugs
    corecore