83 research outputs found

    Stabilizing Regions in Membrane Proteins

    Get PDF

    PHI-base update: additions to the pathogen–host interaction database

    Get PDF
    The pathogen–host interaction database (PHI-base) is a web-accessible database that catalogues experimentally verified pathogenicity, virulence and effector genes from bacterial, fungal and Oomycete pathogens, which infect human, animal, plant, insect, fish and fungal hosts. Plant endophytes are also included. PHI-base is therefore an invaluable resource for the discovery of genes in medically and agronomically important pathogens, which may be potential targets for chemical intervention. The database is freely accessible to both academic and non-academic users. This publication describes recent additions to the database and both current and future applications. The number of fields that characterize PHI-base entries has almost doubled. Important additional fields deal with new experimental methods, strain information, pathogenicity islands and external references that link the database to external resources, for example, gene ontology terms and Locus IDs. Another important addition is the inclusion of anti-infectives and their target genes that makes it possible to predict the compounds, that may interact with newly identified virulence factors. In parallel, the curation process has been improved and now involves several external experts. On the technical side, several new search tools have been provided and the database is also now distributed in XML format. PHI-base is available at: http://www.phi-base.org/

    GoGene: gene annotation in the fast lane

    Get PDF
    High-throughput screens such as microarrays and RNAi screens produce huge amounts of data. They typically result in hundreds of genes, which are often further explored and clustered via enriched GeneOntology terms. The strength of such analyses is that they build on high-quality manual annotations provided with the GeneOntology. However, the weakness is that annotations are restricted to process, function and location and that they do not cover all known genes in model organisms. GoGene addresses this weakness by complementing high-quality manual annotation with high-throughput text mining extracting co-occurrences of genes and ontology terms from literature. GoGene contains over 4 000 000 associations between genes and gene-related terms for 10 model organisms extracted from more than 18 000 000 PubMed entries. It does not cover only process, function and location of genes, but also biomedical categories such as diseases, compounds, techniques and mutations. By bringing it all together, GoGene provides the most recent and most complete facts about genes and can rank them according to novelty and importance. GoGene accepts keywords, gene lists, gene sequences and protein sequences as input and supports search for genes in PubMed, EntrezGene and via BLAST. Since all associations of genes to terms are supported by evidence in the literature, the results are transparent and can be verified by the user. GoGene is available at http://gopubmed.org/gogene

    Mining the Gene Wiki for functional genomic knowledge

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Ontology-based gene annotations are important tools for organizing and analyzing genome-scale biological data. Collecting these annotations is a valuable but costly endeavor. The Gene Wiki makes use of Wikipedia as a low-cost, mass-collaborative platform for assembling text-based gene annotations. The Gene Wiki is comprised of more than 10,000 review articles, each describing one human gene. The goal of this study is to define and assess a computational strategy for translating the text of Gene Wiki articles into ontology-based gene annotations. We specifically explore the generation of structured annotations using the Gene Ontology and the Human Disease Ontology.</p> <p>Results</p> <p>Our system produced 2,983 candidate gene annotations using the Disease Ontology and 11,022 candidate annotations using the Gene Ontology from the text of the Gene Wiki. Based on manual evaluations and comparisons to reference annotation sets, we estimate a precision of 90-93% for the Disease Ontology annotations and 48-64% for the Gene Ontology annotations. We further demonstrate that this data set can systematically improve the results from gene set enrichment analyses.</p> <p>Conclusions</p> <p>The Gene Wiki is a rapidly growing corpus of text focused on human gene function. Here, we demonstrate that the Gene Wiki can be a powerful resource for generating ontology-based gene annotations. These annotations can be used immediately to improve workflows for building curated gene annotation databases and knowledge-based statistical analyses.</p

    Combining ChIP-chip and Expression Profiling to Model the MoCRZ1 Mediated Circuit for Ca2+/Calcineurin Signaling in the Rice Blast Fungus

    Get PDF
    Significant progress has been made in defining the central signaling networks in many organisms, but collectively we know little about the downstream targets of these networks and the genes they regulate. To reconstruct the regulatory circuit of calcineurin signal transduction via MoCRZ1, a Magnaporthe oryzae C2H2 transcription factor activated by calcineurin dephosphorylation, we used a combined approach of chromatin immunoprecipitation - chip (ChIP-chip), coupled with microarray expression studies. One hundred forty genes were identified as being both a direct target of MoCRZ1 and having expression concurrently differentially regulated in a calcium/calcineurin/MoCRZ1 dependent manner. Highly represented were genes involved in calcium signaling, small molecule transport, ion homeostasis, cell wall synthesis/maintenance, and fungal virulence. Of particular note, genes involved in vesicle mediated secretion necessary for establishing host associations, were also found. MoCRZ1 itself was a target, suggesting a previously unreported autoregulation control point. The data also implicated a previously unreported feedback regulation mechanism of calcineurin activity. We propose that calcium/calcineurin regulated signal transduction circuits controlling development and pathogenicity manifest through multiple layers of regulation. We present results from the ChIP-chip and expression analysis along with a refined model of calcium/calcineurin signaling in this important plant pathogen

    Combined SVM-CRFs for Biological Named Entity Recognition with Maximal Bidirectional Squeezing

    Get PDF
    Biological named entity recognition, the identification of biological terms in text, is essential for biomedical information extraction. Machine learning-based approaches have been widely applied in this area. However, the recognition performance of current approaches could still be improved. Our novel approach is to combine support vector machines (SVMs) and conditional random fields (CRFs), which can complement and facilitate each other. During the hybrid process, we use SVM to separate biological terms from non-biological terms, before we use CRFs to determine the types of biological terms, which makes full use of the power of SVM as a binary-class classifier and the data-labeling capacity of CRFs. We then merge the results of SVM and CRFs. To remove any inconsistencies that might result from the merging, we develop a useful algorithm and apply two rules. To ensure biological terms with a maximum length are identified, we propose a maximal bidirectional squeezing approach that finds the longest term. We also add a positive gain to rare events to reinforce their probability and avoid bias. Our approach will also gradually extend the context so more contextual information can be included. We examined the performance of four approaches with GENIA corpus and JNLPBA04 data. The combination of SVM and CRFs improved performance. The macro-precision, macro-recall, and macro-F1 of the SVM-CRFs hybrid approach surpassed conventional SVM and CRFs. After applying the new algorithms, the macro-F1 reached 91.67% with the GENIA corpus and 84.04% with the JNLPBA04 data

    A Comprehensive Benchmark of Kernel Methods to Extract Protein–Protein Interactions from Literature

    Get PDF
    The most important way of conveying new findings in biomedical research is scientific publication. Extraction of protein–protein interactions (PPIs) reported in scientific publications is one of the core topics of text mining in the life sciences. Recently, a new class of such methods has been proposed - convolution kernels that identify PPIs using deep parses of sentences. However, comparing published results of different PPI extraction methods is impossible due to the use of different evaluation corpora, different evaluation metrics, different tuning procedures, etc. In this paper, we study whether the reported performance metrics are robust across different corpora and learning settings and whether the use of deep parsing actually leads to an increase in extraction quality. Our ultimate goal is to identify the one method that performs best in real-life scenarios, where information extraction is performed on unseen text and not on specifically prepared evaluation data. We performed a comprehensive benchmarking of nine different methods for PPI extraction that use convolution kernels on rich linguistic information. Methods were evaluated on five different public corpora using cross-validation, cross-learning, and cross-corpus evaluation. Our study confirms that kernels using dependency trees generally outperform kernels based on syntax trees. However, our study also shows that only the best kernel methods can compete with a simple rule-based approach when the evaluation prevents information leakage between training and test corpora. Our results further reveal that the F-score of many approaches drops significantly if no corpus-specific parameter optimization is applied and that methods reaching a good AUC score often perform much worse in terms of F-score. We conclude that for most kernels no sensible estimation of PPI extraction performance on new text is possible, given the current heterogeneity in evaluation data. Nevertheless, our study shows that three kernels are clearly superior to the other methods

    The Potential for pathogenicity was present in the ancestor of the Ascomycete subphylum Pezizomycotina

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Previous studies in Ascomycetes have shown that the function of gene families of which the size is considerably larger in extant pathogens than in non-pathogens could be related to pathogenicity traits. However, by only comparing gene inventories in extant species, no insights can be gained into the evolutionary process that gave rise to these larger family sizes in pathogens. Moreover, most studies which consider gene families in extant species only tend to explain observed differences in gene family sizes by gains rather than by losses, hereby largely underestimating the impact of gene loss during genome evolution.</p> <p>Results</p> <p>In our study we used a selection of recently published genomes of Ascomycetes to analyze how gene family gains, duplications and losses have affected the origin of pathogenic traits. By analyzing the evolutionary history of gene families we found that most gene families with an enlarged size in pathogens were present in an ancestor common to both pathogens and non-pathogens. The majority of these families were selectively maintained in pathogenic lineages, but disappeared in non-pathogens. Non-pathogen-specific losses largely outnumbered pathogen-specific losses.</p> <p>Conclusions</p> <p>We conclude that most of the proteins for pathogenicity were already present in the ancestor of the Ascomycete lineages we used in our study. Species that did not develop pathogenicity seemed to have reduced their genetic complexity compared to their ancestors. We further show that expansion of gained or already existing families in a species-specific way is important to fine-tune the specificities of the pathogenic host-fungus interaction.</p
    corecore