58 research outputs found

    What properties characterize the hub proteins of the protein-protein interaction network of Saccharomyces cerevisiae?

    Get PDF
    BACKGROUND: Most proteins interact with only a few other proteins while a small number of proteins (hubs) have many interaction partners. Hub proteins and non-hub proteins differ in several respects; however, understanding is not complete about what properties characterize the hubs and set them apart from proteins of low connectivity. Therefore, we have investigated what differentiates hubs from non-hubs and static hubs (party hubs) from dynamic hubs (date hubs) in the protein-protein interaction network of Saccharomyces cerevisiae. RESULTS: The many interactions of hub proteins can only partly be explained by bindings to similar proteins or domains. It is evident that domain repeats, which are associated with binding, are enriched in hubs. Moreover, there is an over representation of multi-domain proteins and long proteins among the hubs. In addition, there are clear differences between party hubs and date hubs. Fewer of the party hubs contain long disordered regions compared to date hubs, indicating that these regions are important for flexible binding but less so for static interactions. Furthermore, party hubs interact to a large extent with each other, supporting the idea of party hubs as the cores of highly clustered functional modules. In addition, hub proteins, and in particular party hubs, are more often ancient. Finally, the more recent paralogs of party hubs are underrepresented. CONCLUSION: Our results indicate that multiple and repeated domains are enriched in hub proteins and, further, that long disordered regions, which are common in date hubs, are particularly important for flexible binding

    Expansion of Protein Domain Repeats

    Get PDF
    Many proteins, especially in eukaryotes, contain tandem repeats of several domains from the same family. These repeats have a variety of binding properties and are involved in protein–protein interactions as well as binding to other ligands such as DNA and RNA. The rapid expansion of protein domain repeats is assumed to have evolved through internal tandem duplications. However, the exact mechanisms behind these tandem duplications are not well-understood. Here, we have studied the evolution, function, protein structure, gene structure, and phylogenetic distribution of domain repeats. For this purpose we have assigned Pfam-A domain families to 24 proteomes with more sensitive domain assignments in the repeat regions. These assignments confirmed previous findings that eukaryotes, and in particular vertebrates, contain a much higher fraction of proteins with repeats compared with prokaryotes. The internal sequence similarity in each protein revealed that the domain repeats are often expanded through duplications of several domains at a time, while the duplication of one domain is less common. Many of the repeats appear to have been duplicated in the middle of the repeat region. This is in strong contrast to the evolution of other proteins that mainly works through additions of single domains at either terminus. Further, we found that some domain families show distinct duplication patterns, e.g., nebulin domains have mainly been expanded with a unit of seven domains at a time, while duplications of other domain families involve varying numbers of domains. Finally, no common mechanism for the expansion of all repeats could be detected. We found that the duplication patterns show no dependence on the size of the domains. Further, repeat expansion in some families can possibly be explained by shuffling of exons. However, exon shuffling could not have created all repeats

    Nebulin : A Study of Protein Repeat Evolution

    No full text
    Protein domain repeats are common in proteins that are central to the organization of a cell, in particular in eukaryotes. They are known to evolve through internal tandem duplications. However, the understanding of the underlying mechanisms is incomplete. To shed light on repeat expansion mechanisms, we have studied the evolution of the muscle protein Nebulin, a protein that contains a large number of actin-binding nebulin domains. Nebulin proteins have evolved from an invertebrate precursor containing two nebulin domains. Repeat regions have expanded through duplications of single domains, as well as duplications of a super repeat (SR) consisting of seven nebulins. We show that the SR has evolved independently into large regions in at least three instances: twice in the invertebrate Branchiostoma floridae and once in vertebrates. In-depth analysis reveals several recent tandem duplications in the Nebulin gene. The events involve both single-domain and multidomain SR units or several SR units. There are single events, but frequently the same unit is duplicated multiple times. For instance, an ancestor of human and chimpanzee underwent two tandem duplications. The duplication junction coincides with an Alu transposon, thus suggesting duplication through Alu-mediated homologous recombination. Duplications in the SR region consistently involve multiples of seven domains. However, the exact unit that is duplicated varies both between species and within species. Thus, multiple tandem duplications of the same motif did not create the large Nebulin protein. Finally, analysis of segmental duplications in the human genome reveals that duplications are more common in genes containing domain repeats than in those coding for nonrepeated proteins. In fact, segmental duplications are found three to six times more often in long repeated genes than expected by chance. authorCount :4</p

    Pattern of Internal Domain Duplications in the Chicken Protein ENSGALP00000020382, with 66 Repeating Nebulin Domains (Pfam)

    No full text
    <div><p>(A) The intensity of the squares is related to alignment scores, and the numbers on both axes indicate the domains in N-to-C terminal orientation. As there were gaps in the repeat sequence (<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.0020114#pcbi-0020114-g001" target="_blank">Figure 1</a>), these were introduced as domains at positions 6, 18, 25, and 32.</p><p>(B) ACV calculated from the alignment scores in (A) with the average similarity to domains at distance 1, 2, 3, etc. The ACV are normalized around zero, hence the dotted line at zero is the mean score between all domains in the protein. The ACV was calculated before introducing the gaps as domains (dashed line) and after (solid line). When the regions with no domain assignments were regarded as domains, the pattern of seven repeating units became much clearer, indicating that the gaps are also domains.</p></div

    Overview of the Methodology

    No full text
    <div><p>(A) In a protein with five domains, a unit of three N-terminal domains has been duplicated in tandem.</p><p>(B) To identify this evolutionary event, alignment of all domain pairs in the protein is performed.</p><p>(C) The alignment scores between the domains displayed in a matrix with increasing color intensity for higher scores. The diagonal shows alignment scores for each domain to itself, while square 1,2 gives the score between the first and the second domain. A pattern where domain pairs 3–6, 4–7, and 5–8 have the highest alignment scores can be seen.</p><p>(D) From the alignment scores, an ACV is calculated as the mean alignment score at each distance normalized around zero. The distance between the domains is defined as one for neighbouring domains, while domain pairs with one domain between them have distance two, etc. In this example a peak at distance three can be seen. Hence, we assume that this protein has evolved through the duplication of three domains.</p></div

    ACVs for Proteins with Repeats of Eight Different Domain Families

    No full text
    <p>Solid line shows ACVs for proteins with repeats of eight different domain families. In the bottom right diagram, the ACV for all proteins with repeats is displayed. The ACV for each family was normalized around zero, hence the dashed line at zero is the mean bit score between all domains in the family. The <i>p</i>-value for each datapoint was calculated from random shuffling of domains, and peaks with <i>p</i>-values below 10<sup>−5</sup> are indicated with an asterisk. The dotted line illustrates the fraction of repeats of the domain family with each repeat length, i.e., nonrepeated proteins have length one. The number of proteins/domains that goes into each figure can be found in Materials and Methods. Data for the remaining domain families can be found in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.0020114#pcbi-0020114-sg002" target="_blank">Figure S2</a>.</p

    Hierarchical Clustering of the ACVs from Each Protein

    No full text
    <div><p>(A) Dendrogram of the 20 clusters. Each cluster is indicated by a cluster number followed by the number of proteins in the cluster.</p><p>(B) The average ACV for each cluster with red color for values below the average and green for values above.</p><p>(C) Distribution of the ten largest domain families, as well as nebulin, in the different clusters. The expected number of proteins from a domain family in each cluster was calculated using random shuffling, and Z-scores for overrepresentation (green) and underrepresentation (red) in the cluster were calculated. The numbers after the domain family names is the number of repeats of the family.</p></div
    corecore