25 research outputs found

    SAAR repeats lengths and occurrences in the human proteome.

    No full text
    <p>A) For each amino acid, the number of SAAR events were calculated with a minimum length of 5 residues. The heat map shows the number of events in the various repeat lengths, with each cell indicating the number of events grouped by their physical properties (Hydrophobic—orange, Hydrophilic—blue). Empty cells indicate 0 events. B) For all the amino acids in the human proteome, the longest repeat present and the maximum number of SAAR events present within a protein was calculated. A scatter plot shows the longest repeat length (x-axis) vs. the maximum number of events tolerated in a single protein (y-axis), the grey shaded region indicates 95% confidence in a linear regression analysis.</p

    Single Amino Acid Repeats in the Proteome World: Structural, Functional, and Evolutionary Insights

    No full text
    <div><p>Microsatellites or simple sequence repeats (SSR) are abundant, highly diverse stretches of short DNA repeats present in all genomes. Tandem mono/tri/hexanucleotide repeats in the coding regions contribute to single amino acids repeats (SAARs) in the proteome. While SSRs in the coding region always result in amino acid repeats, a majority of SAARs arise due to a combination of various codons representing the same amino acid and not as a consequence of SSR events. Certain amino acids are abundant in repeat regions indicating a positive selection pressure behind the accumulation of SAARs. By analysing 22 proteomes including the human proteome, we explored the functional and structural relationship of amino acid repeats in an evolutionary context. Only ~15% of repeats are present in any known functional domain, while ~74% of repeats are present in the disordered regions, suggesting that SAARs add to the functionality of proteins by providing flexibility, stability and act as linker elements between domains. Comparison of SAAR containing proteins across species reveals that while shorter repeats are conserved among orthologs, proteins with longer repeats, >15 amino acids, are unique to the respective organism. Lysine repeats are well conserved among orthologs with respect to their length and number of occurrences in a protein. Other amino acids such as glutamic acid, proline, serine and alanine repeats are generally conserved among the orthologs with varying repeat lengths. These findings suggest that SAARs have accumulated in the proteome under positive selection pressure and that they provide flexibility for optimal folding of functional/structural domains of proteins. The insights gained from our observations can help in effective designing and engineering of proteins with novel features.</p></div

    Number of repeats in genes associated with repeat expansion diseases.

    No full text
    <p>The genes PABPN1, ATXN3 and HTT are associated with repeat expansion diseases. PABPN1 is associated with PolyA (GCG) expansion that causes OPMD, ATXN3 and HTT cause SCA3 and HTT, respectively upon abnormal expansion of PolyQ (CAG). (A) shows the length of SSR present in the CDS of the genes (B) shows the length of SAARs in the proteins of the respective genes.</p

    Invertebrate and vertebrate species under study.

    No full text
    <p>Invertebrate and vertebrate species under study.</p

    Codon fraction for SAAR coding regions against codon fraction of CDS in the human genome.

    No full text
    <p>Codon fraction for SAAR coding regions against codon fraction of CDS in the human genome.</p

    Protein orthologs for SAAR containing proteins among animals (one event is a mono aminoacid stretch of ≥5 residues).

    No full text
    <p>Protein orthologs for SAAR containing proteins among animals (one event is a mono aminoacid stretch of ≥5 residues).</p

    Group 1 and group 2 orthologs.

    No full text
    <p>The data shows the SAAR conservation between orthologous proteins of several vertebrate and invertebrate proteomes. Amino acids are ranked by the frequency of conservation (lower value indicates better conservation). Group 1 (top) contains human ortholog pairs in which the SAAR events are conserved in terms of events and repeat lengths. Group 2 (bottom) contains human ortholog pairs that are conserved by the number of events but not their repeat length. Group 1 is two way clustered to group amino acids and species with similar conservation. Group 2 follows the order of amino acids and species of group 1 to allow an easier comparison between plots.</p

    SAAR density and longest SAAR among all proteomes.

    No full text
    <p>(A) SAAR density was calculated and normalized to one million residues for the indicated proteomes and plotted as a heatmap where the X-axes show individual amino acid associated repeats and Y-axes have all the organisms under study. The plot is two way clustered to group amino acids and species with similar densities. (B) A heat map was generated for the longest repeat length for all the amino acids in each of the vertebrate and invertebrate proteomes under study. The plot is two-way clustered between the longest SAARs (X-axis) and the proteomes (Y-axis).</p

    Comparison of amino acid and SAAR density in the human proteome.

    No full text
    <p>The amino acid density and SAAR density normalized to 1 million residues were calculated for all the 20 amino acids. (A) The percentage of each amino acid in the whole proteome and SAARs are represented as vertical bars. The black dot represents the SAAR percentage in each bar and the opposite end indicates amino acid percentage in the whole proteome. The bars are grouped by colour to indicate the three distinct patterns observed (see text) (blue—group1, red—group2, green—group3) (B) A distance-based dendrogram was plotted for the values of amino acid density and SAAR density for all the 20 amino acids. A distinct pattern of preference for SAARs vs. proteome density is seen clustered as three groups as described in (A) (see text)</p

    Distribution of various molecular activities associated with SAAR containing proteins in the human proteome.

    No full text
    <p>Molecular function class for proteins was annotated using the Panther database. Proteins with any amino acid repeat and proteins with a particular abundant amino acid repeat such as proline, alanine, leucine, glutamine, threonine, serine, glycine and glutamic acid are reported. Each cell in the heat map shows the fold enrichment between expected and observed frequency in reference to the human proteome. The colours in the heatmap scale from red to green where the fold enrichment is from 0 to 5 respectively.</p
    corecore