25 research outputs found
RR operating principle.
<p>(A) In order to check for repeats of a given length, RR breaks down a protein sequence into a series of adjacent fragments (shown here as black rectangles), and calculates the number of mismatches (lightning symbols). We refer to this number also as the linear or raw distance to distinguish it from the register-corrected or cyclic distance. (B) RR counts the number <i>N</i> of pairs of adjacent fragments that (without any alignment) differ in at most <i>m</i> positions. The table shows exemplary analysis for the protein fragments depicted in (A). (C) The <i>N(m)</i> fingerprint of a given protein contains information about the repeat content of the protein. The graph shows relation between the number of mismatches (<i>m</i>) and the number of fragment pairs that differ in <i>m</i> or less positions (<i>N(m)</i>), for the protein fragments depicted in (A).</p
Identification of tandem repeats as a function of number of mismatches allowed.
<p>Distribution of tandem repeat scores for a set of TALEs and a random subset of tested proteins from the UniRef100 database. The fingerprint function <i>N(m)</i> was calculated for 117376 randomly chosen UniProt100 sequences (grey lines), and all but atypical TALEs (149 out of 182) (black lines). For every allowed number of mismatches <i>m</i>, the medians and 10%, 25%, 70% and 80% percentiles of <i>N(m)</i> are plotted. The <i>N(m)</i> fingerprints of “typical” proteins and TALE proteins have very different shapes.</p
Locations, identities, and covariation of variable residues.
<p>Sequence logos for the repeats of (A) TALE; (B), #32; (C), #37; (D), #38 with secondary structure annotation (expressed as frequency plots). All residues in a repeat are numbered and the variable residues are boxed. Heat maps show frequencies of particular pairs of variable residues in the given positions, calculated for the total number of repeats, which was (A) 2733; (B) 16; (C) 44; and (D) 105. Covariation of variables is also described by mutual information (MI).</p
Novel tandem repeats identified in the search.
<p>Novel tandem repeats identified in the search.</p
Numbers of tandem repeat sequences identified in the search.
<p>Number of protein sequences containing at least 10 repeat pairs (11 similar adjacent fragments) of 30–43 aa residues (blue), and number of remaining sequences after excluding composite (red) and then identical repeats (green). Purple bars show the number of repeat families after clustering.</p
Schematic structure of tandem repeat proteins.
<p>Proteins: (A) C9ZJS6, (B) Q586F2, (C) Q586F1 and (D) C9ZJS7 contain repeats of type #38 and TAD; (E) Protein G5AAP8 contains tandem repeats of type #37 and TAD; (F) Protein W1I7I9 contains tandem repeats of type #32, HET domain, MPN domain and triple TAD. Please refer to <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0179173#pone.0179173.t001" target="_blank">Table 1</a> for repeat sequences and descriptions.</p
Jensen-Shannon divergence scores for pairwise alignments of tandem repeat arrays.
<p>Jensen-Shannon divergence (JSD) of tested tandem repeats. The matrix shows resulting JSD values for the paired alignments of the repeat arrays from the tested proteins: TAL effectors (TALE), TALE likes: RipTAL, BurrH, MOrTL1, MOrTL2, and proteins that contained repeats #32, #37 and #38 (as described in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0179173#pone.0179173.t001" target="_blank">Table 1</a>). Larger JSD values indicate higher divergence.</p
Cumulative Shannon scores for tandem repeats.
<p>Cumulative Shannon complexity of the representative tandem repeats from the TALEs identified in the search (black line) and all the other (non-TALE) hits (colored lines, clustered by repeat length). The dotted red line marks the minimal complexity of repeats considered further.</p
Cellular localization prediction.
<p>Cellular localization of the proteins was predicted using NucPred. The cumulative scores for protein sequences containing tandem repeats (red) and TALEs (black) are shown. Scale: 0 means no nuclear localization, 1 means exclusively nuclear localization.</p
Taxonomy of the hits.
<p>(A) Participation of all identified filtered hits in taxonomic domains. (B), Participation of individual classes of hits in taxonomic domains. (C) Participation of sequences of different kingdoms within Eukaryota.</p