24 research outputs found
Disordered Binding Regions and Linear Motifs—Bridging the Gap between Two Models of Molecular Recognition
<div><p>Intrinsically disordered proteins (IDPs) exist without the presence of a stable tertiary structure in isolation. These proteins are often involved in molecular recognition processes via their disordered binding regions that can recognize partner molecules by undergoing a coupled folding and binding process. The specific properties of disordered binding regions give way to specific, yet transient interactions that enable IDPs to play central roles in signaling pathways and act as hubs of protein interaction networks. An alternative model of protein-protein interactions with largely overlapping functional properties is offered by the concept of linear interaction motifs. This approach focuses on distilling a short consensus sequence pattern from proteins with a common interaction partner. These motifs often reside in disordered regions and are considered to mediate the interaction roughly independent from the rest of the protein. Although a connection between linear motifs and disordered binding regions has been established through common examples, the complementary nature of the two concepts has yet to be fully explored. In many cases the sequence based definition of linear motifs and the structural context based definition of disordered binding regions describe two aspects of the same phenomenon. To gain insight into the connection between the two models, prediction methods were utilized. We combined the regular expression based prediction of linear motifs with the disordered binding region prediction method ANCHOR, each specialized for either model to get the best of both worlds. The thorough analysis of the overlap of the two methods offers a bioinformatics tool for more efficient binding site prediction that can serve a wide range of practical implications. At the same time it can also shed light on the theoretical connection between the two co-existing interaction models.</p> </div
Efficiency of ANCHOR for individual LIG motifs.
<p>The total number of annotated instances for each of the ligand binding motifs that have at least three independent instances in the ELM database. Dark red bars show the number of instances overlapping ANCHOR predicted binding regions. Stars mark the motifs for which the recovery rate is significantly higher than that expected by chance alone (see Methods).</p
Efficiency of ANCHOR on linear motifs with respect to structural context.
<p>Instances are classified according to the predicted disorder status of their flanking sequential environment. Motif instances with both N- and C-terminal flanking regions predicted by IUPred as ordered are classified as ‘Ordered’, instances with one or both flanking regions predicted to be disordered are classified as ‘Mixed’ or ‘Disordered’, respectively.</p
The predictive power of ANCHOR as a filter in motif searches.
<p>Left: fraction of known instances of ligand binding motifs recognized by ANCHOR. Right: the reduction in the number of ligand binding motif hits in the eukaryotic sequences of UniProt.</p
Results of motif scans in the three domains of life.
<p>A: the number of found motif hits from the four different motif groups (CLV – cleavage sites, LIG – generic ligand binding motifs, MOD – modification sites, TRG – target signals) in the eukaryotic (blue), bacterial (green) and archaeal (red) proteins included in the UniProt database. As the size of the three databases are different, the number of actual hits in the prokaryotic sets were scaled with the ratio of the number of residues in each dataset. B: The average number of motif hits per protein for the three databases covering the three domains of life. Again, hit numbers in prokaryotic sets are corrected for different number of residues compared to the eukaryotic dataset. Coloring is identical to that of part A (red – archaea, green – bacteria, blue – eukaryotes). C: The upper bars show the number of found hits in the three domains of life for PCNA, PDZ and cyclin binding motifs (the average hits per protein for the three motifs are shown with vertical lines in part B; note that there are three different PDZ binding motifs and each one is shown with separate lines in part B but only their cumulative numbers are shown in part C). Lower bars show the actual number of corresponding partner domains that can serve as interaction partners for these motifs in the same datasets. Domain occurrences were taken from the PFAM database. Prokaryotic hit numbers are corrected for different number of proteins and the coloring scheme follows that of parts A and B.</p
Efficiency of ANCHOR on linear motifs with respect to bound secondary structure.
<p>Motifs are classified according to the adopted secondary structure upon binding to their partner domain. The efficiency of ANCHOR for separate structural classes were calculated and were compared to the average efficiency calculated on all instances. The difference between average and secondary structure-specific efficiencies were compared using standard χ<sup>2</sup> test. The resulting p-values are quoted for all 4 separate structural classes.</p
Application to whole proteome scans.
<p>Results of applying ANCHOR as a filter for scanning the human proteome for instances of the nuclear receptor interacting motif (LIG_NRBOX). A: number of proteins matching the motif; B–D: fraction of proteins containing NRBOX matches with biological process, cellular component and molecular function GO annotations (B, C and D, respectively) matching the annotations of true NRBOX instances (black boxes), with other annotations (grey boxes), and no annotations (white boxes). The height of bars in B–D represent 100% of all found motifs and thus in each sub-figure the complete left bar stands for 7,897 proteins and the complete bar on the right stands for 1,623. The two different number of hits are scaled to accurately represent enrichments of correctly annotated proteins.</p
Examples of true motif instances with ANCHOR predictions.
<p>A: Three instances of the nuclear receptor binding motif (LIG_NRBOX) in the human nuclear receptor coactivator 2 protein (NCOA2). Left: IUPred (red) and ANCHOR (blue) predictions for the 601–800 region of NCOA2. Red bars mark the motif instances with the black box showing the instance for which the corresponding bound structure is shown. Right: the structure of NCOA2 (salmon) with the motif shown in red bound to the glucocorticoid receptor (grey) (structure 1 m2z). B: MAP kinase binding motif (LIG_MAPK_1) in the rhodenase domain of the human DUS6 protein. Left: IUPred (red) and ANCHOR (blue) predictions with the red bar and black box indicating the position of the motif. Right: the structure of DUS6 in monomeric form (structure 1 hzm) with the motif shown in red.</p
Alignment of six representative members of the Mgm101p sequence family.
<p>The C-terminal extension for <i>Dictyostelium discoideum</i> and <i>Naegleria gruberi</i>, that lack sequence conservation, were omitted from the alignment.</p
Complementation of the temperature sensitive mutant.
<p>A GlyYP plate with 1, M2915-7C <i>mgm101-1</i>ts, and M2915-7C transformed with pCXJ22 plasmids containing, 2, <i>S.cerevisiae MGM101</i>, 3, <i>A.millepora MGM101</i> with a <i>S.cerevisiae</i> mitochondrial targeting signal (A.m.ID-A.m.C)(<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0056465#pone-0056465-g003" target="_blank">Fig. 3</a>), 4, <i>S.cerevisiae</i> intrinsically disordered (ID) domain joined to <i>A.millepora</i> core region (S.c.ID-A.m.C) and 5, <i>A.millepora</i> ID region joined to <i>S.cerevisiae</i> core region (A.m.ID-S.c.C). The constructs all have a mitochondrial targeting signal sequence as shown in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0056465#pone.0056465.s002" target="_blank">Figure S2</a>. The plate was incubated at 35°C for 3 days before being photographed.</p