17 research outputs found

    Computational Approaches to Predict Protein Interaction

    Get PDF

    Phydbac "Gene Function Predictor" : a gene annotation tool based on genomic context analysis

    Get PDF
    BACKGROUND: The large amount of completely sequenced genomes allows genomic context analysis to predict reliable functional associations between prokaryotic proteins. Major methods rely on the fact that genes encoding physically interacting partners or members of shared metabolic pathways tend to be proximate on the genome, to evolve in a correlated manner and to be fused as a single sequence in another organism. RESULTS: The new "Gene Function Predictor", linked to the web server Phydbac proposes putative associations between Escherichia coli K-12 proteins derived from a combination of these methods. We show that associations made by this tool are more accurate than linkages found in the other established databases. Predicted assignments to GO categories, based on pre-existing functional annotations of associated proteins are also available. This new database currently holds 9,379 pairwise links at an expected success rate of at least 80%, the 6,466 functional predictions to GO terms derived from these links having a level of accuracy higher than 70%. CONCLUSION: The "Gene Function Predictor" is an automatic tool that aims to help biologists by providing them hypothetical functional predictions out of genomic context characteristics. The "Gene Function predictor" is available at

    Prediction of evolutionarily conserved interologs in Mus musculus

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Identification of protein-protein interactions is an important first step to understand living systems. High-throughput experimental approaches have accumulated large amount of information on protein-protein interactions in human and other model organisms. Such interaction information has been successfully transferred to other species, in which the experimental data are limited. However, the annotation transfer method could yield false positive interologs due to the lack of conservation of interactions when applied to phylogenetically distant organisms.</p> <p>Results</p> <p>To address this issue, we used phylogenetic profile method to filter false positives in interologs based on the notion that evolutionary conserved interactions show similar patterns of occurrence along the genomes. The approach was applied to <it>Mus musculus</it>, in which the experimentally identified interactions are limited. We first inferred the protein-protein interactions in <it>Mus musculus </it>by using two approaches: i) identifying mouse orthologs of interacting proteins (interologs) based on the experimental protein-protein interaction data from other organisms; and ii) analyzing frequency of mouse ortholog co-occurrence in predicted operons of bacteria. We then filtered possible false-positives in the predicted interactions using the phylogenetic profiles. We found that this filtering method significantly increased the frequency of interacting protein-pairs coexpressed in the same cells/tissues in gene expression omnibus (GEO) database as well as the frequency of interacting protein-pairs shared the similar Gene Ontology (GO) terms for biological processes and cellular localizations. The data supports the notion that phylogenetic profile helps to reduce the number of false positives in interologs.</p> <p>Conclusion</p> <p>We have developed protein-protein interaction database in mouse, which contains 41109 interologs. We have also developed a web interface to facilitate the use of database <url>http://lgsun.grc.nia.nih.gov/mppi/</url>.</p

    A global gene evolution analysis on Vibrionaceae family using phylogenetic profile

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>Vibrionaceae </it>represent a significant portion of the cultivable heterotrophic sea bacteria; they strongly affect nutrient cycling and some species are devastating pathogens.</p> <p>In this work we propose an improved phylogenetic profile analysis on 14 <it>Vibrionaceae </it>genomes, to study the evolution of this family on the basis of gene content.</p> <p>The phylogenetic profile is based on the observation that genes involved in the same process (e.g. metabolic pathway or structural complex) tend to be concurrently present or absent within different genomes. This allows the prediction of hypothetical functions on the basis of a shared phylogenetic profiles. Moreover this approach is useful to identify putative laterally transferred elements on the basis of their presence on distantly phylogenetically related bacteria.</p> <p>Results</p> <p><it>Vibrionaceae </it>ORFs were aligned against all the available bacterial proteomes. Phylogenetic profile is defined as an array of distances, based on aminoacid substitution matrixes, from single genes to all their orthologues. Final phylogenetic profiles, derived from non-redundant list of all ORFs, was defined as the median of all the profiles belonging to the cluster. The resulting phylogenetic profiles matrix contains gene clusters on the rows and organisms on the columns.</p> <p>Cluster analysis identified groups of "core genes" with a widespread high similarity across all the organisms and several clusters that contain genes homologous only to a limited set of organisms. On each of these clusters, COG class enrichment has been calculated. The analysis reveals that clusters of core genes have the highest number of enriched classes, while the others are enriched just for few of them like DNA replication, recombination and repair.</p> <p>Conclusion</p> <p>We found that mobile elements have heterogeneous profiles not only across the entire set of organisms, but also within <it>Vibrionaceae</it>; this confirms their great influence on bacteria evolution even inside the same family. Furthermore, several hypothetical proteins highly correlate with mobile elements profiles suggesting a possible horizontal transfer mechanism for the evolution of these genes. Finally, we suggested the putative role of some ORFs having an unknown function on the basis of their phylogenetic profile similarity to well characterized genes.</p

    Comparative assessment of performance and genome dependence among phylogenetic profiling methods

    Get PDF
    BACKGROUND: The rapidly increasing speed with which genome sequence data can be generated will be accompanied by an exponential increase in the number of sequenced eukaryotes. With the increasing number of sequenced eukaryotic genomes comes a need for bioinformatic techniques to aid in functional annotation. Ideally, genome context based techniques such as proximity, fusion, and phylogenetic profiling, which have been so successful in prokaryotes, could be utilized in eukaryotes. Here we explore the application of phylogenetic profiling, a method that exploits the evolutionary co-occurrence of genes in the assignment of functional linkages, to eukaryotic genomes. RESULTS: In order to evaluate the performance of phylogenetic profiling in eukaryotes, we assessed the relative performance of commonly used profile construction techniques and genome compositions in predicting functional linkages in both prokaryotic and eukaryotic organisms. When predicting linkages in E. coli with a prokaryotic profile, the use of continuous values constructed from transformed BLAST bit-scores performed better than profiles composed of discretized E-values; the use of discretized E-values resulted in more accurate linkages when using S. cerevisiae as the query organism. Extending this analysis by incorporating several eukaryotic genomes in profiles containing a majority of prokaryotes resulted in similar overall accuracy, but with a surprising reduction in pathway diversity among the most significant linkages. Furthermore, the application of phylogenetic profiling using profiles composed of only eukaryotes resulted in the loss of the strong correlation between common KEGG pathway membership and profile similarity score. Profile construction methods, orthology definitions, ontology and domain complexity were explored as possible sources of the poor performance of eukaryotic profiles, but with no improvement in results. CONCLUSION: Given the current set of completely sequenced eukaryotic organisms, phylogenetic profiling using profiles generated from any of the commonly used techniques was found to yield extremely poor results. These findings imply genome-specific requirements for constructing functionally relevant phylogenetic profiles, and suggest that differences in the evolutionary history between different kingdoms might generally limit the usefulness of phylogenetic profiling in eukaryotes

    Investigation of factors affecting prediction of protein-protein interaction networks by phylogenetic profiling

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The use of computational methods for predicting protein interaction networks will continue to grow with the number of fully sequenced genomes available. The Co-Conservation method, also known as the Phylogenetic profiles method, is a well-established computational tool for predicting functional relationships between proteins.</p> <p>Results</p> <p>Here, we examined how various aspects of this method affect the accuracy and topology of protein interaction networks. We have shown that the choice of reference genome influences the number of predictions involving proteins of previously unknown function, the accuracy of predicted interactions, and the topology of predicted interaction networks. We show that while such results are relatively insensitive to the <it>E</it>-value threshold used in defining homologs, predicted interactions are influenced by the similarity metric that is employed. We show that differences in predicted protein interactions are biologically meaningful, where judicious selection of reference genomes, or use of a new scoring scheme that explicitly considers reference genome relatedness, produces known protein interactions as well as predicted protein interactions involving coordinated biological processes that are not accessible using currently available databases.</p> <p>Conclusion</p> <p>These studies should prove valuable for future studies seeking to further improve phylogenetic profiling methodologies as well for efforts to efficiently employ such methods to develop new biological insights.</p

    Understanding Communication Signals during Mycobacterial Latency through Predicted Genome-Wide Protein Interactions and Boolean Modeling

    Get PDF
    About 90% of the people infected with Mycobacterium tuberculosis carry latent bacteria that are believed to get activated upon immune suppression. One of the fundamental challenges in the control of tuberculosis is therefore to understand molecular mechanisms involved in the onset of latency and/or reactivation. We have attempted to address this problem at the systems level by a combination of predicted functional protein∶protein interactions, integration of functional interactions with large scale gene expression studies, predicted transcription regulatory network and finally simulations with a Boolean model of the network. Initially a prediction for genome-wide protein functional linkages was obtained based on genome-context methods using a Support Vector Machine. This set of protein functional linkages along with gene expression data of the available models of latency was employed to identify proteins involved in mediating switch signals during dormancy. We show that genes that are up and down regulated during dormancy are not only coordinately regulated under dormancy-like conditions but also under a variety of other experimental conditions. Their synchronized regulation indicates that they form a tightly regulated gene cluster and might form a latency-regulon. Conservation of these genes across bacterial species suggests a unique evolutionary history that might be associated with M. tuberculosis dormancy. Finally, simulations with a Boolean model based on the regulatory network with logical relationships derived from gene expression data reveals a bistable switch suggesting alternating latent and actively growing states. Our analysis based on the interaction network therefore reveals a potential model of M. tuberculosis latency

    Estimating the extent of horizontal gene transfer in metagenomic sequences

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Although the extent of horizontal gene transfer (HGT) in complete genomes has been widely studied, its influence in the evolution of natural communities of prokaryotes remains unknown. The availability of metagenomic sequences allows us to address the study of global patterns of prokaryotic evolution in samples from natural communities. However, the methods that have been commonly used for the study of HGT are not suitable for metagenomic samples. Therefore it is important to develop new methods or to adapt existing ones to be used with metagenomic sequences.</p> <p>Results</p> <p>We have created two different methods that are suitable for the study of HGT in metagenomic samples. The methods are based on phylogenetic and DNA compositional approaches, and have allowed us to assess the extent of possible HGT events in metagenomes for the first time. The methods are shown to be compatible and quite precise, although they probably underestimate the number of possible events. Our results show that the phylogenetic method detects HGT in between 0.8% and 1.5% of the sequences, while DNA compositional methods identify putative HGT in between 2% and 8% of the sequences. These ranges are very similar to these found in complete genomes by related approaches. Both methods act with a different sensitivity since they probably target HGT events of different ages: the compositional method mostly identifies recent transfers, while the phylogenetic is more suitable for the detections of older events. Nevertheless, the study of the number of HGT events in metagenomic sequences from different communities shows a consistent trend for both methods: the lower amount is found for the sequences of the Sargasso Sea metagenome, while the higher quantity is found in the whale fall metagenome from the bottom of the ocean. The significance of these observations is discussed.</p> <p>Conclusion</p> <p>The computational approaches that are used to find possible HGT events in complete genomes can be adapted to work with metagenomic samples, where a level of high performance is shown in different metagenomic samples. The percentage of possible HGT events that were observed is close to that found for complete genomes, and different microbiomes show diverse ratios of putative HGT events. This is probably related with both environmental factors and the composition in the species of each particular community.</p
    corecore