135,570 research outputs found

    Protein–Protein Interactions More Conserved within Species than across Species

    Get PDF
    Experimental high-throughput studies of protein–protein interactions are beginning to provide enough data for comprehensive computational studies. Today, about ten large data sets, each with thousands of interacting pairs, coarsely sample the interactions in fly, human, worm, and yeast. Another about 55,000 pairs of interacting proteins have been identified by more careful, detailed biochemical experiments. Most interactions are experimentally observed in prokaryotes and simple eukaryotes; very few interactions are observed in higher eukaryotes such as mammals. It is commonly assumed that pathways in mammals can be inferred through homology to model organisms, e.g. the experimental observation that two yeast proteins interact is transferred to infer that the two corresponding proteins in human also interact. Two pairs for which the interaction is conserved are often described as interologs. The goal of this investigation was a large-scale comprehensive analysis of such inferences, i.e. of the evolutionary conservation of interologs. Here, we introduced a novel score for measuring the overlap between protein–protein interaction data sets. This measure appeared to reflect the overall quality of the data and was the basis for our two surprising results from our large-scale analysis. Firstly, homology-based inferences of physical protein–protein interactions appeared far less successful than expected. In fact, such inferences were accurate only for extremely high levels of sequence similarity. Secondly, and most surprisingly, the identification of interacting partners through sequence similarity was significantly more reliable for protein pairs within the same organism than for pairs between species. Our analysis underlined that the discrepancies between different datasets are large, even when using the same type of experiment on the same organism. This reality considerably constrains the power of homology-based transfer of interactions. In particular, the experimental probing of interactions in distant model organisms has to be undertaken with some caution. More comprehensive images of protein–protein networks will require the combination of many high-throughput methods, including in silico inferences and predictions. http://www.rostlab.org/results/2006/ppi_homology

    What Evidence Is There for the Homology of Protein-Protein Interactions?

    Get PDF
    The notion that sequence homology implies functional similarity underlies much of computational biology. In the case of protein-protein interactions, an interaction can be inferred between two proteins on the basis that sequence-similar proteins have been observed to interact. The use of transferred interactions is common, but the legitimacy of such inferred interactions is not clear. Here we investigate transferred interactions and whether data incompleteness explains the lack of evidence found for them. Using definitions of homology associated with functional annotation transfer, we estimate that conservation rates of interactions are low even after taking interactome incompleteness into account. For example, at a blastp -value threshold of , we estimate the conservation rate to be about between S. cerevisiae and H. sapiens. Our method also produces estimates of interactome sizes (which are similar to those previously proposed). Using our estimates of interaction conservation we estimate the rate at which protein-protein interactions are lost across species. To our knowledge, this is the first such study based on large-scale data. Previous work has suggested that interactions transferred within species are more reliable than interactions transferred across species. By controlling for factors that are specific to within-species interaction prediction, we propose that the transfer of interactions within species might be less reliable than transfers between species. Protein-protein interactions appear to be very rarely conserved unless very high sequence similarity is observed. Consequently, inferred interactions should be used with care

    Biological interaction networks are conserved at the module level

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Orthologous genes are highly conserved between closely related species and biological systems often utilize the same genes across different organisms. However, while sequence similarity often implies functional similarity, interaction data is not well conserved even for proteins with high sequence similarity. Several recent studies comparing high throughput data including expression, protein-protein, protein-DNA, and genetic interactions between close species show conservation at a much lower rate than expected.</p> <p>Results</p> <p>In this work we collected comprehensive high-throughput interaction datasets for four model organisms (<it>S. cerevisiae, S. pombe, C. elegans</it>, and <it>D. melanogaster</it>) and carried out systematic analyses in order to explain the apparent lower conservation of interaction data when compared to the conservation of sequence data. We first showed that several previously proposed hypotheses only provide a limited explanation for such lower conservation rates. We combined all interaction evidences into an integrated network for each species and identified functional modules from these integrated networks. We then demonstrate that interactions that are part of functional modules are conserved at much higher rates than previous reports in the literature, while interactions that connect between distinct functional modules are conserved at lower rates.</p> <p>Conclusions</p> <p>We show that conservation is maintained between species, but mainly at the module level. Our results indicate that interactions within modules are much more likely to be conserved than interactions between proteins in different modules. This provides a network based explanation to the observed conservation rates that can also help explain why so many biological processes are well conserved despite the lower levels of conservation for the interactions of proteins participating in these processes.</p> <p>Accompanying website: <url>http://www.sb.cs.cmu.edu/CrossSP</url></p

    Protein Complexes in Bacteria

    Get PDF
    Large-scale analyses of protein complexes have recently become available for Escherichia coli and Mycoplasma pneumoniae, yielding 443 and 116 heteromultimeric soluble protein complexes, respectively. We have coupled the results of these mass spectrometrycharacterized protein complexes with the 285 “gold standard” protein complexes identified by EcoCyc. A comparison with databases of gene orthology, conservation, and essentiality identified proteins conserved or lost in complexes of other species. For instance, of 285 “gold standard” protein complexes in E. coli, less than 10% are fully conserved among a set of 7 distantly-related bacterial “model” species. Complex conservation follows one of three models: well-conserved complexes, complexes with a conserved core, and complexes with partial conservation but no conserved core. Expanding the comparison to 894 distinct bacterial genomes illustrates fractional conservation and the limits of co-conservation among components of protein complexes: just 14 out of 285 model protein complexes are perfectly conserved across 95% of the genomes used, yet we predict more than 180 may be partially conserved across at least half of the genomes. No clear relationship between gene essentiality and protein complex conservation is observed, as even poorly conserved complexes contain a significant number of essential proteins. Finally, we identify 183 complexes containing well-conserved components and uncharacterized proteins which will be interesting targets for future experimental studies

    Adenovirus type 5 E4 Orf3 protein targets promyelocytic leukaemia (PML) protein nuclear domains for disruption via a sequence in PML isoform II that is predicted as a protein interaction site by bioinformatic analysis

    Get PDF
    Human adenovirus type 5 infection causes the disruption of structures in the cell nucleus termed promyelocytic leukaemia (PML) protein nuclear domains or ND10, which contain the PML protein as a critical component. This disruption is achieved through the action of the viral E4 Orf3 protein, which forms track-like nuclear structures that associate with the PML protein. This association is mediated by a direct interaction of Orf3 with a specific PML isoform, PMLII. We show here that the Orf3 interaction properties of PMLII are conferred by a 40 aa residue segment of the unique C-terminal domain of the protein. This segment was sufficient to confer interaction on a heterologous protein. The analysis was informed by prior application of a bioinformatic tool for the prediction of potential protein interaction sites within unstructured protein sequences (predictors of naturally disordered region analysis; PONDR). This tool predicted three potential molecular recognition elements (MoRE) within the C-terminal domain of PMLII, one of which was found to form the core of the Orf3 interaction site, thus demonstrating the utility of this approach. The sequence of the mapped Orf3-binding site on PML protein was found to be relatively poorly conserved across other species; however, the overall organization of MoREs within unstructured sequence was retained, suggesting the potential for conservation of functional interactions

    Systematic identification of functional plant modules through the integration of complementary data sources

    Get PDF
    A major challenge is to unravel how genes interact and are regulated to exert specific biological functions. The integration of genome-wide functional genomics data, followed by the construction of gene networks, provides a powerful approach to identify functional gene modules. Large-scale expression data, functional gene annotations, experimental protein-protein interactions, and transcription factor-target interactions were integrated to delineate modules in Arabidopsis (Arabidopsis thaliana). The different experimental input data sets showed little overlap, demonstrating the advantage of combining multiple data types to study gene function and regulation. In the set of 1,563 modules covering 13,142 genes, most modules displayed strong coexpression, but functional and cis-regulatory coherence was less prevalent. Highly connected hub genes showed a significant enrichment toward embryo lethality and evidence for cross talk between different biological processes. Comparative analysis revealed that 58% of the modules showed conserved coexpression across multiple plants. Using module-based functional predictions, 5,562 genes were annotated, and an evaluation experiment disclosed that, based on 197 recently experimentally characterized genes, 38.1% of these functions could be inferred through the module context. Examples of confirmed genes of unknown function related to cell wall biogenesis, xylem and phloem pattern formation, cell cycle, hormone stimulus, and circadian rhythm highlight the potential to identify new gene functions. The module-based predictions offer new biological hypotheses for functionally unknown genes in Arabidopsis (1,701 genes) and six other plant species (43,621 genes). Furthermore, the inferred modules provide new insights into the conservation of coexpression and coregulation as well as a starting point for comparative functional annotation

    N-TERMINAL PROCESSING OF RIBOSOMAL PROTEIN L27 IN STAPHYLOCOCCUS AUREUS

    Get PDF
    The bacterial ribosome is essential to cell growth yet little is known about how its proteins attain their mature structures. Recent studies indicate that certain Staphlyococcus aureus bacteriophage protein sequences contain specific sites that may be cleaved by a non-bacteriophage enzyme (Poliakov et al. 2008). The phage cleavage site was found to bear sequence similarity to the N-terminus of S. aureus ribosomal protein L27. Previous studies in E. coli (Wower et al.1998; Maguire et al. 2005) found that L27 is situated adjacent to the ribosomal peptidyl transferase site, where it likely aids in new peptide formation. The predicted S. aureus L27 protein contains an additional N-terminal sequence not observed within the N-terminus of the otherwise similar E. coli L27; this sequence appears to be cleaved, indicating yet-unobserved ribosomal protein post-translational processing and use of host processes by phage. Phylogenetic analysis shows that L27 processing has the potential to be highly conserved. Further study of this phenomenon may aid antibiotic development

    Graph theoretic analysis of protein interaction networks of eukaryotes

    Full text link
    Thanks to recent progress in high-throughput experimental techniques, the datasets of large-scale protein interactions of prototypical multicellular species, the nematode worm Caenorhabditis elegans and the fruit fly Drosophila melanogaster, have been assayed. The datasets are obtained mainly by using the yeast hybrid method, which contains false-positive and false-negative simultaneously. Accordingly, while it is desirable to test such datasets through further wet experiments, here we invoke recent developed network theory to test such high throughput datasets in a simple way. Based on the fact that the key biological processes indispensable to maintaining life are universal across eukaryotic species, and the comparison of structural properties of the protein interaction networks (PINs) of the two species with those of the yeast PIN, we find that while the worm and the yeast PIN datasets exhibit similar structural properties, the current fly dataset, though most comprehensively screened ever, does not reflect generic structural properties correctly as it is. The modularity is suppressed and the connectivity correlation is lacking. Addition of interlogs to the current fly dataset increases the modularity and enhances the occurrence of triangular motifs as well. The connectivity correlation function of the fly, however, remains distinct under such interlogs addition, for which we present a possible scenario through an in silico modeling.Comment: 7 pages, 6 figures, 2 table
    corecore