Search CORE

Computational Approaches to Predict Protein Interaction

Author: Tien-Hao Chang
Publication venue: 'IntechOpen'
Publication date: 30/03/2012
Field of study

IntechOpen

Predicting the Impact of Alternative Splicing on Plant MADS Domain Protein Function

Author: Busscher-Lange J.
Dijk A.D.J., van
Ham R.C.H.J., van
Immink G.H.
Morabito G.
Severing E.I.
Publication venue
Publication date
Field of study

Several genome-wide studies demonstrated that alternative splicing (AS) significantly increases the transcriptome complexity in plants. However, the impact of AS on the functional diversity of proteins is difficult to assess using genome-wide approaches. The availability of detailed sequence annotations for specific genes and gene families allows for a more detailed assessment of the potential effect of AS on their function. One example is the plant MADS-domain transcription factor family, members of which interact to form protein complexes that function in transcription regulation. Here, we perform an in silico analysis of the potential impact of AS on the protein-protein interaction capabilities of MIKC-type MADS-domain proteins. We first confirmed the expression of transcript isoforms resulting from predicted AS events. Expressed transcript isoforms were considered functional if they were likely to be translated and if their corresponding AS events either had an effect on predicted dimerisation motifs or occurred in regions known to be involved in multimeric complex formation, or otherwise, if their effect was conserved in different species. Nine out of twelve MIKC MADS-box genes predicted to produce multiple protein isoforms harbored putative functional AS events according to those criteria. AS events with conserved effects were only found at the borders of or within the K-box domain. We illustrate how AS can contribute to the evolution of interaction networks through an example of selective inclusion of a recently evolved interaction motif in the MADS AFFECTING FLOWERING1-3 (MAF1–3) subclade. Furthermore, we demonstrate the potential effect of an AS event in SHORT VEGETATIVE PHASE (SVP), resulting in the deletion of a short sequence stretch including a predicted interaction motif, by overexpression of the fully spliced and the alternatively spliced SVP transcripts. For most of the AS events we were able to formulate hypotheses about the potential impact on the interaction capabilities of the encoded MIKC protein

Continuous-time modeling of cell fate determination in Arabidopsis flowers

Abstract Background The genetic control of floral organ specification is currently being investigated by various approaches, both experimentally and through modeling. Models and simulations have mostly involved boolean or related methods, and so far a quantitative, continuous-time approach has not been explored. Results We propose an ordinary differential equation (ODE) model that describes the gene expression dynamics of a gene regulatory network that controls floral organ formation in the model plant <it>Arabidopsis thaliana</it>. In this model, the dimerization of MADS-box transcription factors is incorporated explicitly. The unknown parameters are estimated from (known) experimental expression data. The model is validated by simulation studies of known mutant plants. Conclusions The proposed model gives realistic predictions with respect to independent mutation data. A simulation study is carried out to predict the effects of a new type of mutation that has so far not been made in <it>Arabidopsis</it>, but that could be used as a severe test of the validity of the model. According to our predictions, the role of dimers is surprisingly important. Moreover, the functional loss of any dimer leads to one or more phenotypic alterations.</p

Springer - Publisher Connector

Directory of Open Access Journals

Conserved and variable correlated mutations in the plant MADS protein network

Author: A Bairoch
A Becker
A Fuchs
A Lupas
A Sali
AA Fodor
Aalt DJ van Dijk
AD Han
ADJ van Dijk
AH Paterson
AK Ramani
AS Veron
AT Brunger
BA Krizek
C Espinosa-soto
CM Buslje
CS Goh
CS Miller
D Altschuh
D Juan
DA Afonnikov
DS Horner
E Santelli
EA Merritt
F Fornara
F Pazos
F Pazos
F Pazos
G Angenent
GA Tuskan
H Ashkenazy
HB Fraser
HY Shan
HY Shan
HY Yu
I Halperin
J Lim
J Sundstrom
JD Thompson
JG Caporaso
JL Riechmann
JMG Izarzugaza
K Hill
K Huang
K Kaufmann
K Kaufmann
L Hakes
L Mendoza
L Parenicova
L Pellegrini
LC Martin
LJ Cseke
LP Martinez-Castilla
M Hassler
M Ng
M Socolich
MA Fares
MJ Buck
N Shitsukawa
NA Kane
NJ Mulder
O Noivirt
PJ Kraulis
PJ Waddell
R Melzer
R Ming
R Velasco
RC Edgar
RGH Immink
RKP Kuipers
RM Clark
Roeland CHJ van Ham
S Ciannamea
S De Bodt
S de Folter
S Henikoff
S Mika
SA Goff
SA Rensing
SAA Travers
SAA Travers
SR Eddy
T Hernandez-Hernandez
T Sato
Y Mo
YZ Yang
YZ Yang
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Plant MADS domain proteins are involved in a variety of developmental processes for which their ability to form various interactions is a key requisite. However, not much is known about the structure of these proteins or their complexes, whereas such knowledge would be valuable for a better understanding of their function. Here, we analyze those proteins and the complexes they form using a correlated mutation approach in combination with available structural, bioinformatics and experimental data. Results Correlated mutations are affected by several types of noise, which is difficult to disentangle from the real signal. In our analysis of the MADS domain proteins, we apply for the first time a correlated mutation analysis to a family of interacting proteins. This provides a unique way to investigate the amount of signal that is present in correlated mutations because it allows direct comparison of mutations in various family members and assessing their conservation. We show that correlated mutations in general are conserved within the various family members, and if not, the variability at the respective positions is less in the proteins in which the correlated mutation does not occur. Also, intermolecular correlated mutation signals for interacting pairs of proteins display clear overlap with other bioinformatics data, which is not the case for non-interacting protein pairs, an observation which validates the intermolecular correlated mutations. Having validated the correlated mutation results, we apply them to infer the structural organization of the MADS domain proteins. Conclusion Our analysis enables understanding of the structural organization of the MADS domain proteins, including support for predicted helices based on correlated mutation patterns, and evidence for a specific interaction site in those proteins.</p

Springer - Publisher Connector

Directory of Open Access Journals

Public Library of Science (PLOS)

Sequence Motifs in MADS Transcription Factors Responsible for Specificity and Diversification of Protein-Protein Interaction

Author: A Becker
A Sali
A Shmygelska
Aalt D. J. van Dijk
AD Han
ADJ van Dijk
AH Paterson
AJM Walhout
B Causier
B Davies
BA Krizek
BA Shoemaker
BJ Adamczyk
C Landgraf
CH Yeang
Christos Ouzounis
CT Rollins
D Li
D Weigel
DH Erwin
DJ Reiss
E Akiva
E Ferraro
E Santelli
ES Coen
G Ditta
G Grigoryan
G Theissen
GD Amoutzias
Gerco C. Angenent
Giuseppa Morabito
H Ma
H Wang
HY Yu
IE Sanchez
J DeBartolo
JD Klemm
JL Riechmann
JL Riechmann
JM Skerker
JR Chen
K Kaufmann
K Kaufmann
KB Levin
KL Morrison
L Breiman
L Burger
L Parenicova
L Yant
M Egea-Cortines
M Ng
M Socolich
M Vandenbussche
M Weigt
MA Fares
Martijn Fiers
ME Cusick
NJ Marianayagam
O Keller
OJ Ratcliffe
P Bradley
R Arora
R Diaz-Uriate
R Favaro
R Ming
R Velasco
RB Jones
RC Edgar
RD Finn
RD Gietz
RGH Immink
RGH Immink
RGH Immink
RGH Immink
RGH Immink
Richard G. H. Immink
Roeland C. H. J. van Ham
S Ciannamea
S De Bodt
S De Bodt
S De Bodt
S de Folter
S Drea
S Ferrario
S Ferrario
S Mika
S Pelaz
SA Kempin
SH Tan
SJ Liljegren
SJ Nurrish
SR Eddy
T Honma
U Hartmann
WP Lehrach
WP Russ
X Daura
Y Hanzawa
Y Ofran
YZ Yang
YZ Yang
Z Schwarzsommer
Z Wunderlich
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Protein sequences encompass tertiary structures and contain information about specific molecular interactions, which in turn determine biological functions of proteins. Knowledge about how protein sequences define interaction specificity is largely missing, in particular for paralogous protein families with high sequence similarity, such as the plant MADS domain transcription factor family. In comparison to the situation in mammalian species, this important family of transcription regulators has expanded enormously in plant species and contains over 100 members in the model plant species Arabidopsis thaliana. Here, we provide insight into the mechanisms that determine protein-protein interaction specificity for the Arabidopsis MADS domain transcription factor family, using an integrated computational and experimental approach. Plant MADS proteins have highly similar amino acid sequences, but their dimerization patterns vary substantially. Our computational analysis uncovered small sequence regions that explain observed differences in dimerization patterns with reasonable accuracy. Furthermore, we show the usefulness of the method for prediction of MADS domain transcription factor interaction networks in other plant species. Introduction of mutations in the predicted interaction motifs demonstrated that single amino acid mutations can have a large effect and lead to loss or gain of specific interactions. In addition, various performed bioinformatics analyses shed light on the way evolution has shaped MADS domain transcription factor interaction specificity. Identified protein-protein interaction motifs appeared to be strongly conserved among orthologs, indicating their evolutionary importance. We also provide evidence that mutations in these motifs can be a source for sub- or neo-functionalization. The analyses presented here take us a step forward in understanding protein-protein interactions and the interplay between protein sequences and network evolution

Directory of Open Access Journals

Interactome-Wide Prediction of Protein-Protein Binding Sites Reveals Effects of Protein Sequence Variation in Arabidopsis thaliana

Author: Boyen P.
Dijk A.D.J., van
Neven F.
Valentim F.L.
Publication venue
Publication date: 01/01/2012
Field of study

The specificity of protein-protein interactions is encoded in those parts of the sequence that compose the binding interface. Therefore, understanding how changes in protein sequence influence interaction specificity, and possibly the phenotype, requires knowing the location of binding sites in those sequences. However, large-scale detection of protein interfaces remains a challenge. Here, we present a sequence- and interactome-based approach to mine interaction motifs from the recently published Arabidopsis thaliana interactome. The resultant proteome-wide predictions are available via www.ab.wur.nl/sliderbio and set the stage for further investigations of protein-protein binding sites. To assess our method, we first show that, by using a priori information calculated from protein sequences, such as evolutionary conservation and residue surface accessibility, we improve the performance of interface prediction compared to using only interactome data. Next, we present evidence for the functional importance of the predicted sites, which are under stronger selective pressure than the rest of protein sequence. We also observe a tendency for compensatory mutations in the binding sites of interacting proteins. Subsequently, we interrogated the interactome data to formulate testable hypotheses for the molecular mechanisms underlying effects of protein sequence mutations. Examples include proteins relevant for various developmental processes. Finally, we observed, by analysing pairs of paralogs, a correlation between functional divergence and sequence divergence in interaction sites. This analysis suggests that large-scale prediction of binding sites can cast light on evolutionary processes that shape protein-protein interaction network

SLIDER: A Generic Metaheuristic for the Discovery of Correlated Motifs in Protein-Protein Interaction Networks

Author: A. D. J. van Dijk
D. Van Dyck
F. Neven
P. Boyen
R. C. H. J. van Ham
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Predicting and understanding transcription factor interactions based on sequence level determinants of combinatorial control

Author: Angenent G.C.
Immink G.H.
ter Braak C.J.F.
van Dijk A.D.J.
van Ham R.C.H.J.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2008
Field of study

Motivation: Transcription factor interactions are the cornerstone of combinatorial control, which is a crucial aspect of the gene regulatory system. Understanding and predicting transcription factor interactions based on their sequence alone is difficult since they are often part of families of factors sharing high sequence identity. Given the scarcity of experimental data on interactions compared to available sequence data, however, it would be most useful to have accurate methods for the prediction of such interactions. Results: We present a method consisting of a Random Forestbased feature-selection procedure that selects relevant motifs out of a set found using a correlated motif search algorithm. Prediction accuracy for several transcription factor families (bZIP, MADS, homeobox and forkhead) reaches 60¿90%. In addition, we identified those parts of the sequence that are important for the interaction specificity, and show that these are in agreement with available data. We also used the predictors to perform genome-wide scans for interaction partners and recovered both known and putative new interaction partner

Systems biology of plant molecular networks: from networks to models

Author: Valentim F.L.
Publication venue: 'Wageningen University and Research'
Publication date: 01/01/2015
Field of study

Developmental processes are controlled by regulatory networks (GRNs), which are tightly coordinated networks of transcription factors (TFs) that activate and repress gene expression within a spatial and temporal context. In Arabidopsis thaliana, the key components and network structures of the GRNs controlling major plant reproduction processes, such as floral transition and floral organ identity specification, have been comprehensively unveiled. This thanks to advances in ‘omics’ technologies combined with genetic approaches. Yet, because of the multidimensional nature of the data and because of the complexity of the regulatory mechanisms, there is a clear need to analyse these data in such a way that we can understand how TFs control complex traits. The use of mathematical modelling facilitates the representation of the dynamics of a GRN and enables better insight into GRN complexity; while multidimensional data analysis enables the identification of properties that connect different layers from genotype-to-phenotype. Mathematical modelling and multidimensional data analysis are both parts of a systems biology approach, and this thesis presents the application of both types of systems biology approaches to flowering GRNs. Chapter 1 comprehensively reviews advances in understanding of GRNs underlying plant reproduction processes, as well as mathematical models and multidimensional data analysis approaches to study plant systems biology. As discussed in Chapter 1, an important aspect of understanding these GRNs is how perturbations in one part of the network are transmitted to other parts, and ultimately how this results in changes in phenotype. Given the complexity of recent versions of Arabidopsis GRNs - which involves highly-connected, non-linear networks of TFs, microRNAs, movable factors, hormones and chromatin modifying proteins - it is not possible to predict the effect of gene perturbations on e.g. flowering time in an intuitive way by just looking at the network structure. Therefore, mathematical modelling plays an important role in providing a quantitative understanding of GRNs. In addition, aspects of multidimensional data analysis for understanding GRNs underlying plant reproduction are also discussed in the first Chapter. This includes not only the integration of experimental data, e.g. transcriptomics with protein-DNA binding profiling, but also the integration of different types of networks identified by ‘omics’ approaches, e.g. protein-protein interaction networks and gene regulatory networks. Chapter 2 describes a mathematical model for representing the dynamics of key genes in the GRN of flowering time control. We modelled with ordinary differential equations (ODEs) the physical interactions and regulatory relationships of a set of core genes controlling Arabidopsis flowering time in order to quantitatively analyse the relationship between their expression levels and the flowering time response. We considered a core GRN composed of eight TFs: SHORT VEGETATIVE PHASE (SVP), FLOWERING LOCUS C (FLC), AGAMOUS-LIKE 24 (AGL24), SUPPRESSOR OF OVEREXPRESSION OF CONSTANS 1 (SOC1), APETALA1 (AP1), FLOWERING LOCUS T (FT), LEAFY (LFY) and FD. The connections and interactions amongst these components are justified based on experimental data, and the model is parameterised by fitting the equations to quantitative data on gene expression and flowering time. Then the model is validated with transcript data from a range of mutants. We verify that the model is able to describe some quantitative patterns seen in expression data under genetic perturbations, which supported the credibility of the model and its dynamic properties. The proposed model is able to predict the flowering time by assessing changes in the expression of the orchestrator of floral transition AP1. Overall, the work presents a framework, which allows addressing how different quantitative inputs are combined into a single quantitative output, i.e. the timing of flowering. The model allowed studying the established genetic regulations, and we discuss in Chapter 5 the steps towards using the proposed framework to zoom in and obtain new insides about the molecular mechanisms underlying the regulations. Systems biology does not only involve the use of dynamic modelling but also the development of approaches for multidimensional data analysis that are able to integrate multiple levels of systems organization. In Chapter 3, we aimed at comprehensively identifying and characterizing cis-regulatory mutations that have an effect on the GRN of flowering time control. By using ChIP-seq data and information about known DNA binding motifs of TFs involved in plant reproduction, we identified single-nucleotide polymorphisms (SNPs) that are highly discriminative in the classification of the flowering time phenotypes. Often, SNPs that overlap the position of experimentally determined binding sites (e.g. by ChIP-seq), are considered putative regulatory SNPs. We showed that regulatory SNPs are difficult to pinpoint among the sea of polymorphisms localized within binding sites determined by ChIP-seq studies. To overcome this, we narrowed the resolution by focusing on the subset of SNPs that are located within ChIP-seq peaks but that are also part of known regulatory motifs. These SNPs were used as input in a classification algorithm that could predict flowering time of Arabidopsis accessions relative to Col-0. Our strategy is able to identify SNPs that have a biological link with changes in flowering time. We then surveyed the literature to formulate hypothesis that explain the regulatory mechanism underlying the difference in phenotype conferred by a SNP. Examples include SNPs that disrupt the flowering time gene FT; in which the mutation presumably disrupts the binding region of SVP. In Chapter 5 we discuss the steps towards extending our approach to obtain a more comprehensive survey of variants that have an effect on the flowering time control. In Chapter 4, we propose a method for genome-wide prediction of protein-protein interaction (PPI) sites form the Arabidopsis interactome. Our method, named SLIDERbio, uses features encoded in the sequence of proteins and their interactions to predict PPI sites. More specifically, our method mines PPI networks to find over-represented sequence motifs in pairs of interacting proteins. In addition, the inter-species conservation of these over-represented motifs, as well as their predicted surface accessibility, are take into account to compute the likelihood of these motifs being located in a PPI site. Our results suggested that motifs overrepresented in pairs of interacting proteins that are conserved across orthologs and that have high predicted surface accessibility, are in general good putative interaction sites. We applied our method to obtain interactome-wide predictions for Arabidopsis proteins. The results were explored to formulate testable hypothesis for the molecular mechanisms underlying effects of spontaneous or induced mutagenesis on e.g. ZEITLUPE, CXIP1 and SHY2 (proteins relevant for flowering time). In addition, we showed that the binding sites are under stronger selective pressure than the overall protein sequence, and that this may be used to link sequence variability to functional divergence. Finally, Chapter 5 concludes this thesis and describes future perspectives in systems biology applied to the study of GRNs underlying plant reproduction processes. Two key directions are often followed in systems biology: 1) compiling systems-wide snapshots in which the relationships and interactions between the molecules of a system are comprehensively represented; and 2) generating accurate experimental data that can be used as input for the modelling concepts and techniques or multi-dimensional data analysis. Highlighted in Chapter 5 are the limitations in key steps within the systems biology framework applied to GRN studies. In addition, I discussed improvements and extensions that we envision for our model related to the GRN underlying the control of flowering time. Future steps for multi-dimensional data analysis are also discussed. To sum up, I discussed how to connect the different technologies developed in this thesis towards understanding the interplay between the roles of the genes, developmental stages and environmental conditions.</p