36 research outputs found

    All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning

    Get PDF
    Background Automated extraction of protein-protein interactions (PPI) is an important and widely studied task in biomedical text mining. We propose a graph kernel based approach for this task. In contrast to earlier approaches to PPI extraction, the introduced all-paths graph kernel has the capability to make use of full, general dependency graphs representing the sentence structure. Results We evaluate the proposed method on five publicly available PPI corpora, providing the most comprehensive evaluation done for a machine learning based PPI-extraction system. We additionally perform a detailed evaluation of the effects of training and testing on different resources, providing insight into the challenges involved in applying a system beyond the data it was trained on. Our method is shown to achieve state-of-the-art performance with respect to comparable evaluations, with 56.4 F-score and 84.8 AUC on the AImed corpus. Conclusion We show that the graph kernel approach performs on state-of-the-art level in PPI extraction, and note the possible extension to the task of extracting complex interactions. Cross-corpus results provide further insight into how the learning generalizes beyond individual corpora. Further, we identify several pitfalls that can make evaluations of PPI-extraction systems incomparable, or even invalid. These include incorrect cross-validation strategies and problems related to comparing F-score results achieved on different evaluation resources. Recommendations for avoiding these pitfalls are provided. </div

    Comparative analysis of five protein-protein interaction corpora

    Get PDF
    Conclusions: Our comparative analysis uncovers key similarities and differences between the diverse PPI corpora, thus taking an important step towards standardization. In the course of this study we have created a major practical contribution in converting the corpora into a shared format. The conversion software is freely available at http://mars.cs.utu.fi/PPICorpora.</p

    PESCADOR, a web-based tool to assist text-mining of biointeractions extracted from PubMed queries

    Get PDF
    BACKGROUND: Biological function is greatly dependent on the interactions of proteins with other proteins and genes. Abstracts from the biomedical literature stored in the NCBI's PubMed database can be used for the derivation of interactions between genes and proteins by identifying the co-occurrences of their terms. Often, the amount of interactions obtained through such an approach is large and may mix processes occurring in different contexts. Current tools do not allow studying these data with a focus on concepts of relevance to a user, for example, interactions related to a disease or to a biological mechanism such as protein aggregation. RESULTS: To help the concept-oriented exploration of such data we developed PESCADOR, a web tool that extracts a network of interactions from a set of PubMed abstracts given by a user, and allows filtering the interaction network according to user-defined concepts. We illustrate its use in exploring protein aggregation in neurodegenerative disease and in the expansion of pathways associated to colon cancer. CONCLUSIONS: PESCADOR is a platform independent web resource available at: http://cbdm.mdc-berlin.de/tools/pescador

    Integrated Bio-Entity Network: A System for Biological Knowledge Discovery

    Get PDF
    A significant part of our biological knowledge is centered on relationships between biological entities (bio-entities) such as proteins, genes, small molecules, pathways, gene ontology (GO) terms and diseases. Accumulated at an increasing speed, the information on bio-entity relationships is archived in different forms at scattered places. Most of such information is buried in scientific literature as unstructured text. Organizing heterogeneous information in a structured form not only facilitates study of biological systems using integrative approaches, but also allows discovery of new knowledge in an automatic and systematic way. In this study, we performed a large scale integration of bio-entity relationship information from both databases containing manually annotated, structured information and automatic information extraction of unstructured text in scientific literature. The relationship information we integrated in this study includes protein–protein interactions, protein/gene regulations, protein–small molecule interactions, protein–GO relationships, protein–pathway relationships, and pathway–disease relationships. The relationship information is organized in a graph data structure, named integrated bio-entity network (IBN), where the vertices are the bio-entities and edges represent their relationships. Under this framework, graph theoretic algorithms can be designed to perform various knowledge discovery tasks. We designed breadth-first search with pruning (BFSP) and most probable path (MPP) algorithms to automatically generate hypotheses—the indirect relationships with high probabilities in the network. We show that IBN can be used to generate plausible hypotheses, which not only help to better understand the complex interactions in biological systems, but also provide guidance for experimental designs

    The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens

    Get PDF
    Background: The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function. Results: Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole genome mutation screening in Candida albicans and aeruginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory. Conclusion: We conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens

    Nitric Oxide Antagonizes the Acid Tolerance Response that Protects Salmonella against Innate Gastric Defenses

    Get PDF
    Reactive nitrogen species (RNS) derived from dietary and salivary inorganic nitrogen oxides foment innate host defenses associated with the acidity of the stomach. The mechanisms by which these reactive species exert antimicrobial activity in the gastric lumen are, however, poorly understood.The genetically tractable acid tolerance response (ATR) that enables enteropathogens to survive harsh acidity was screened for signaling pathways responsive to RNS. The nitric oxide (NO) donor spermine NONOate derepressed the Fur regulon that controls secondary lines of resistance against organic acids. Despite inducing a Fur-mediated adaptive response, acidified RNS largely repressed oral virulence as demonstrated by the fact that Salmonella bacteria exposed to NO donors during mildly acidic conditions were shed in low amounts in feces and exhibited ameliorated oral virulence. NO prevented Salmonella from mounting a de novo ATR, but was unable to suppress an already functional protective response, suggesting that RNS target regulatory cascades but not their effectors. Transcriptional and translational analyses revealed that the PhoPQ signaling cascade is a critical ATR target of NO in rapidly growing Salmonella. Inhibition of PhoPQ signaling appears to contribute to most of the NO-mediated abrogation of the ATR in log phase bacteria, because the augmented acid sensitivity of phoQ-deficient Salmonella was not further enhanced after RNS treatment.Since PhoPQ-regulated acid resistance is widespread in enteric pathogens, the RNS-mediated inhibition of the Salmonella ATR described herein may represent a common component of innate host defenses
    corecore