10 research outputs found
Interactome-Wide Prediction of Protein-Protein Binding Sites Reveals Effects of Protein Sequence Variation in <em>Arabidopsis thaliana</em>
<div><p>The specificity of protein-protein interactions is encoded in those parts of the sequence that compose the binding interface. Therefore, understanding how changes in protein sequence influence interaction specificity, and possibly the phenotype, requires knowing the location of binding sites in those sequences. However, large-scale detection of protein interfaces remains a challenge. Here, we present a sequence- and interactome-based approach to mine interaction motifs from the recently published <em>Arabidopsis thaliana</em> interactome. The resultant proteome-wide predictions are available via <a href="http://www.ab.wur.nl/sliderbio">www.ab.wur.nl/sliderbio</a> and set the stage for further investigations of protein-protein binding sites. To assess our method, we first show that, by using <em>a priori</em> information calculated from protein sequences, such as evolutionary conservation and residue surface accessibility, we improve the performance of interface prediction compared to using only interactome data. Next, we present evidence for the functional importance of the predicted sites, which are under stronger selective pressure than the rest of protein sequence. We also observe a tendency for compensatory mutations in the binding sites of interacting proteins. Subsequently, we interrogated the interactome data to formulate testable hypotheses for the molecular mechanisms underlying effects of protein sequence mutations. Examples include proteins relevant for various developmental processes. Finally, we observed, by analysing pairs of paralogs, a correlation between functional divergence and sequence divergence in interaction sites. This analysis suggests that large-scale prediction of binding sites can cast light on evolutionary processes that shape protein-protein interaction networks.</p> </div
Putative molecular mechanisms underlying effects of amino acid mutagenesis.
<p>A, C and E show the interacting partners of the proteins ZTL, CXIP1 and SHY2, respectively (interactions shown as dashed lines are not covered in the Arabidopsis Interactome data). B, D and F show a schematic representation of the sequences of the three proteins, including predicted binding sites (coloured box, using same colour as the proteins predicted to bind to it), mutagenesis sites (triangles for experimental mutagenesis sites, circles for naturally occurring sequence variants) and their positions, and residue surface accessibility (RSA) and conservation (bar plots) as predicted based on the sequence. AβB, in the protein ZTL, alanine mutagenesis of the residues 200 and 213 independently eliminate the interaction with ASK1; for ZTL, the stretch of residues from 208 to 220 is predicted as interaction site for binding with ASK2 and ASK4. This leads to the hypothesis that mutation on ZTP, specifically on the residue Leu213, would not only disrupt its interaction with ASK1, but also with other SKP1-like proteins, such as ASK2 and ASK4. CβD, In CXIP1, alanine mutagenesis of two highly conserved motifs (residues from 133 to 137; and residues from 97 to 100) leads to loss of ability to activate CAX1. For CXIP1, the stretch of residues from 125 to 136 was predicted as binding site, which overlaps the mutated motif SNWPT. The interaction of CXIP1 and the other interacting partners identified in the Arabidopsis interactome, i.e. AT5G09830, AT3G50780, AT1G70410 and TCP13 (AT3G02150), may also be mediated by the same motif. EβF, in the sequence of SHY2, three motifs were predicted as binding sites. The first (residues from 59 to 69; represented in grey) overlaps the position of two naturally occurring mutations (residues 67 and 69) and is predicted to be responsible for binding of TOPLESS (TPL, AT5G27030). A second motif (residues from 180 to 187; represented in brown) is predicted to be responsible for the interactions of SHY2 with six other IAA proteins. This leads to the hypothesis that two known mutations disrupt the interaction of SHY2 with TPL, but the same mutations do not impede its interaction with other IAA proteins.</p
Overall description of the predicted binding sites in the Arabidopsis interactome.
<p>(A) Network representation of the Arabidopsis interactome and predicted interaction sites. The vertices and edges in black represent, respectively, the 985 proteins and the 1498 interactions to which predicted motifs are mapped. (B) Degree distributions from the complete protein-protein interaction dataset (grey) and from the subset with only proteins and interactions that have a predicted motif (black). A and B suggest that our method is not biased to predict motifs that can be mapped only to proteins with high degree (<i>i.e.</i> number of interactions); moreover, the proteins with predicted motifs are distributed in different positions in the network. (C) Percentage of residues in the interfaces, either in the predicted interfaces or those observed in the structurally mapped dataset. Standard deviation is indicated.</p
Overall performance of the SLIDERBio algorithm in different datasets.
<p>(AβC) Coverage of protein-protein interfaces and Accuracy of predicted motifs. Each dot represents the result of SLIDERBio using one of the 180 tested sets of parameters, for (A) human, (B) yeast and (C) Arabidopsis structurally mapped subsets. The grey arrows indicate the dot corresponding to the result of the previous SLIDER algorithm. (DβF), Correlation of the performance for each of the SLIDERBio parameter settings is compared among datasets of different species: (D) human vs. yeast; (E) human vs. Arabidopsis; and (F) yeast vs. Arabidopsis. Pearson Correlation Coefficient (PCC) is indicated.</p
Functionally annotated protein sites that coincide with predicted interaction sites.
<p>Functionally annotated protein sites that coincide with predicted interaction sites.</p
Network of flowering time integrator genes.
<p>Green indicates expression in leaf tissue, blue in meristem tissue. Red arrows represent repression, blue arrows activation. Most interactions were taken as given based on literature information, but for regulation of <i>LFY</i> by AGL24 and SOC1, different ways of combining the two inputs were tested (indicated by the light blue arrows). Dashed arrow represents FT transport. Junction symbol next to <i>AP1</i> indicates cooperativity predicted for regulation of <i>AP1</i> by LFY. As indicated, <i>AP1</i> expression is used as a marker for the moment of the floral transition. This network was used to fit expression time-course data and to predict the effect of perturbations. Gene names are given in full in the text.</p
Model predictions and experiments in various mutant backgrounds.
<p>(A) Predicted vs. experimentally observed flowering time for mutants used in training the model (black) and for double mutants used for validation (red). Wild type flowering time is indicated in green. RL, rosette leaves: the more rosette leaves, the later flowering. (B) Prediction of expression changes; total change in expression over the simulated time-course is calculated, normalized against wild type; absolute value is reported to focus on the magnitude of the predicted expression change. Horizontal axis, mutants; vertical axis, genes for which expression change in mutant background is simulated. Note that FLC and SVP are not regulated by other genes in the model and hence, their expression level does not change upon any mutation. For comparison between predictions and experiments, see Figures C and D in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0116973#pone.0116973.s001" target="_blank">S1 File</a>.</p
A Quantitative and Dynamic Model of the Arabidopsis Flowering Time Gene Regulatory Network
<div><p>Various environmental signals integrate into a network of floral regulatory genes leading to the final decision on when to flower. Although a wealth of qualitative knowledge is available on how flowering time genes regulate each other, only a few studies incorporated this knowledge into predictive models. Such models are invaluable as they enable to investigate how various types of inputs are combined to give a quantitative readout. To investigate the effect of gene expression disturbances on flowering time, we developed a dynamic model for the regulation of flowering time in <i>Arabidopsis thaliana</i>. Model parameters were estimated based on expression time-courses for relevant genes, and a consistent set of flowering times for plants of various genetic backgrounds. Validation was performed by predicting changes in expression level in mutant backgrounds and comparing these predictions with independent expression data, and by comparison of predicted and experimental flowering times for several double mutants. Remarkably, the model predicts that a disturbance in a particular gene has not necessarily the largest impact on directly connected genes. For example, the model predicts that <i>SUPPRESSOR OF OVEREXPRESSION OF CONSTANS</i> (<i>SOC1</i>) mutation has a larger impact on <i>APETALA1</i> (<i>AP1</i>), which is not directly regulated by SOC1, compared to its effect on <i>LEAFY</i> (<i>LFY</i>) which is under direct control of SOC1. This was confirmed by expression data. Another model prediction involves the importance of cooperativity in the regulation of <i>APETALA1</i> (<i>AP1</i>) by LFY, a prediction supported by experimental evidence. Concluding, our model for flowering time gene regulation enables to address how different quantitative inputs are combined into one quantitative output, flowering time.</p></div
Effect of knockout mutations (agl24, soc1 and soc1/agl24) on LFY expression and on flowering time.
<p><b>(A)</b> Number of rosette leaves counted at the onset of flowering for wild type and mutants. The plants were grown in long-day conditions at 23Β°C. <b>(B-C)</b><i>LFY</i> expression in wild type and mutants from simulations (B) or microarray experiments (C). The simulations show the expression time-course over 20 days after germination; the microarray data consist of four time-points after transfer of plants grown in short-day to long-day conditions. <b>(D)</b> Effect of efficiency by which <i>LFY</i> expression is activated by AGL24 (Ξ²<sub>6</sub>) and SOC1 (Ξ²<sub>7</sub>), on predicted flowering time. Flowering time, predicted flowering time for given values of parameters. Blue boxes in heatmap indicate best-fit model parameters and the two mutants <i>soc1</i> and <i>agl24</i>; arrows point from wild type model to mutants.</p
Experimental and simulated expression time-course of the genes in the integration network model.
<p>Gene expression was measured by qRT-PCR (shown as dots) of wild type plants grown under long-day conditions at 23Β°C (average and standard deviation are shown). The continuous lines show the simulated gene expression using the parameters estimated by data fitting. Note that <i>FLC</i> and <i>SVP</i> are not regulated by other components of the network and hence are present as input factors only, and their expression level is not simulated by the model. qRT-PCR data for <i>FT</i> was obtained from leaves; for the other genes, qRT-PCR data was obtained from meristem enriched material.</p