49 research outputs found

    Investigation of factors affecting prediction of protein-protein interaction networks by phylogenetic profiling

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The use of computational methods for predicting protein interaction networks will continue to grow with the number of fully sequenced genomes available. The Co-Conservation method, also known as the Phylogenetic profiles method, is a well-established computational tool for predicting functional relationships between proteins.</p> <p>Results</p> <p>Here, we examined how various aspects of this method affect the accuracy and topology of protein interaction networks. We have shown that the choice of reference genome influences the number of predictions involving proteins of previously unknown function, the accuracy of predicted interactions, and the topology of predicted interaction networks. We show that while such results are relatively insensitive to the <it>E</it>-value threshold used in defining homologs, predicted interactions are influenced by the similarity metric that is employed. We show that differences in predicted protein interactions are biologically meaningful, where judicious selection of reference genomes, or use of a new scoring scheme that explicitly considers reference genome relatedness, produces known protein interactions as well as predicted protein interactions involving coordinated biological processes that are not accessible using currently available databases.</p> <p>Conclusion</p> <p>These studies should prove valuable for future studies seeking to further improve phylogenetic profiling methodologies as well for efforts to efficiently employ such methods to develop new biological insights.</p

    The topology of the bacterial co-conserved protein network and its implications for predicting protein function

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Protein-protein interactions networks are most often generated from physical protein-protein interaction data. Co-conservation, also known as phylogenetic profiles, is an alternative source of information for generating protein interaction networks. Co-conservation methods generate interaction networks among proteins that are gained or lost together through evolution. Co-conservation is a particularly useful technique in the compact bacteria genomes. Prior studies in yeast suggest that the topology of protein-protein interaction networks generated from physical interaction assays can offer important insight into protein function. Here, we hypothesize that in bacteria, the topology of protein interaction networks derived via co-conservation information could similarly improve methods for predicting protein function. Since the topology of bacteria co-conservation protein-protein interaction networks has not previously been studied in depth, we first perform such an analysis for co-conservation networks in <it>E. coli </it>K12. Next, we demonstrate one way in which network connectivity measures and global and local function distribution can be exploited to predict protein function for previously uncharacterized proteins.</p> <p>Results</p> <p>Our results showed, like most biological networks, our bacteria co-conserved protein-protein interaction networks had scale-free topologies. Our results indicated that some properties of the physical yeast interaction network hold in our bacteria co-conservation networks, such as high connectivity for essential proteins. However, the high connectivity among protein complexes in the yeast physical network was not seen in the co-conservation network which uses all bacteria as the reference set. We found that the distribution of node connectivity varied by functional category and could be informative for function prediction. By integrating of functional information from different annotation sources and using the network topology, we were able to infer function for uncharacterized proteins.</p> <p>Conclusion</p> <p>Interactions networks based on co-conservation can contain information distinct from networks based on physical or other interaction types. Our study has shown co-conservation based networks to exhibit a scale free topology, as expected for biological networks. We also revealed ways that connectivity in our networks can be informative for the functional characterization of proteins.</p

    Cross-species cluster co-conservation: a new method for generating protein interaction networks

    Get PDF
    Cluster Co-Conservation (CCC) has been extended to a method for developing protein interaction networks based on co-conservation between protein pairs across multiple species, Cross-Species Cluster Co-Conservation (CS-CCC)

    Predicting protein linkages in bacteria: Which method is best depends on task

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Applications of computational methods for predicting protein functional linkages are increasing. In recent years, several bacteria-specific methods for predicting linkages have been developed. The four major genomic context methods are: Gene cluster, Gene neighbor, Rosetta Stone, and Phylogenetic profiles. These methods have been shown to be powerful tools and this paper provides guidelines for when each method is appropriate by exploring different features of each method and potential improvements offered by their combination. We also review many previous treatments of these prediction methods, use the latest available annotations, and offer a number of new observations.</p> <p>Results</p> <p>Using <it>Escherichia coli </it>K12 and <it>Bacillus subtilis</it>, linkage predictions made by each of these methods were evaluated against three benchmarks: functional categories defined by COG and KEGG, known pathways listed in EcoCyc, and known operons listed in RegulonDB. Each evaluated method had strengths and weaknesses, with no one method dominating all aspects of predictive ability studied. For functional categories, as previous studies have shown, the Rosetta Stone method was individually best at detecting linkages and predicting functions among proteins with shared KEGG categories while the Phylogenetic profile method was best for linkage detection and function prediction among proteins with common COG functions. Differences in performance under COG versus KEGG may be attributable to the presence of paralogs. Better function prediction was observed when using a weighted combination of linkages based on reliability versus using a simple unweighted union of the linkage sets. For pathway reconstruction, 99 complete metabolic pathways in <it>E. coli </it>K12 (out of the 209 known, non-trivial pathways) and 193 pathways with 50% of their proteins were covered by linkages from at least one method. Gene neighbor was most effective individually on pathway reconstruction, with 48 complete pathways reconstructed. For operon prediction, Gene cluster predicted completely 59% of the known operons in <it>E. coli </it>K12 and 88% (333/418)in <it>B. subtilis</it>. Comparing two versions of the <it>E. coli </it>K12 operon database, many of the unannotated predictions in the earlier version were updated to true predictions in the later version. Using only linkages found by both Gene Cluster and Gene Neighbor improved the precision of operon predictions. Additionally, as previous studies have shown, combining features based on intergenic region and protein function improved the specificity of operon prediction.</p> <p>Conclusion</p> <p>A common problem for computational methods is the generation of a large number of false positives that might be caused by an incomplete source of validation. By comparing two versions of a database, we demonstrated the dramatic differences on reported results. We used several benchmarks on which we have shown the comparative effectiveness of each prediction method, as well as provided guidelines as to which method is most appropriate for a given prediction task.</p

    Alu insertion polymorphisms shared by Papio baboons and Theropithecus gelada reveal an intertwined common ancestry

    Get PDF
    © 2019 The Author(s). Background: Baboons (genus Papio) and geladas (Theropithecus gelada) are now generally recognized as close phylogenetic relatives, though morphologically quite distinct and generally classified in separate genera. Primate specific Alu retrotransposons are well-established genomic markers for the study of phylogenetic and population genetic relationships. We previously reported a computational reconstruction of Papio phylogeny using large-scale whole genome sequence (WGS) analysis of Alu insertion polymorphisms. Recently, high coverage WGS was generated for Theropithecus gelada. The objective of this study was to apply the high-Throughput poly-Detect method to computationally determine the number of Alu insertion polymorphisms shared by T. gelada and Papio, and vice versa, by each individual Papio species and T. gelada. Secondly, we performed locus-specific polymerase chain reaction (PCR) assays on a diverse DNA panel to complement the computational data. Results: We identified 27,700 Alu insertions from T. gelada WGS that were also present among six Papio species, with nearly half (12,956) remaining unfixed among 12 Papio individuals. Similarly, each of the six Papio species had species-indicative Alu insertions that were also present in T. gelada. In general, P. kindae shared more insertion polymorphisms with T. gelada than did any of the other five Papio species. PCR-based genotype data provided additional support for the computational findings. Conclusions: Our discovery that several thousand Alu insertion polymorphisms are shared by T. gelada and Papio baboons suggests a much more permeable reproductive barrier between the two genera then previously suspected. Their intertwined evolution likely involves a long history of admixture, gene flow and incomplete lineage sorting

    Increased Mutation Frequency in Redox-Impaired Escherichia coli Due to RelA- and RpoS-Mediated Repression of DNA Repair▿

    No full text
    Balancing of reducing equivalents is a fundamental issue in bacterial metabolism and metabolic engineering. Mutations in the key metabolic genes ldhA and pflB of Escherichia coli are known to stall anaerobic growth and fermentation due to a buildup of intracellular NADH. We observed that the rate of spontaneous mutation in E. coli BW25113 (ΔldhA ΔpflB) was an order of magnitude higher than that in wild-type (WT) E. coli BW25113. We hypothesized that the increased mutation frequency was due to an increased NADH/NAD+ ratio in this strain. Using several redox-impaired strains of E. coli and different redox conditions, we confirmed a significant correlation (P < 0.01) between intracellular-NADH/NAD+ ratio and mutation frequency. To identify the genetic basis for this relationship, whole-genome transcriptional profiles were compared between BW25113 WT and BW25113 (ΔldhA ΔpflB). This analysis revealed that the genes involved in DNA repair were expressed at significantly lower levels in BW25113 (ΔldhA ΔpflB). Direct measurements of the extent of DNA repair in BW25113 (ΔldhA ΔpflB) subjected to UV exposure confirmed that DNA repair was inhibited. To identify a direct link between DNA repair and intracellular-redox ratio, the stringent-response-regulatory gene relA and the global-stress-response-regulatory gene rpoS were deleted. In both cases, the mutation frequencies were restored to BW25113 WT levels

    A survey of analysis software for array-comparative genomic hybridisation studies to detect copy number variation

    No full text
    Abstract Copy number variants (CNVs) create a major source of variation among individuals and populations. Array-based comparative genomic hybridisation (aCGH) is a powerful method used to detect and compare the copy numbers of DNA sequences at high resolution along the genome. In recent years, several informatics tools for accurate and efficient CNV detection and assessment have been developed. In this paper, most of the well known algorithms, analysis software and the limitations of that software will be briefly reviewed.</p
    corecore