11 research outputs found

    Simplified Method to Predict Mutual Interactions of Human Transcription Factors Based on Their Primary Structure

    Get PDF
    Background: Physical interactions between transcription factors (TFs) are necessary for forming regulatory protein complexes and thus play a crucial role in gene regulation. Currently, knowledge about the mechanisms of these TF interactions is incomplete and the number of known TF interactions is limited. Computational prediction of such interactions can help identify potential new TF interactions as well as contribute to better understanding the complex machinery involved in gene regulation. Methodology: We propose here such a method for the prediction of TF interactions. The method uses only the primary sequence information of the interacting TFs, resulting in a much greater simplicity of the prediction algorithm. Through an advanced feature selection process, we determined a subset of 97 model features that constitute the optimized model in the subset we considered. The model, based on quadratic discriminant analysis, achieves a prediction accuracy of 85.39 % on a blind set of interactions. This result is achieved despite the selection for the negative data set of only those TF from the same type of proteins, i.e. TFs that function in the same cellular compartment (nucleus) and in the same type of molecular process (transcription initiation). Such selection poses significant challenges for developing models with high specificity, but at the same time better reflects real-world problems. Conclusions: The performance of our predictor compares well to those of much more complex approaches for predicting TF and general protein-protein interactions, particularly when taking the reduced complexity of model utilisation into account

    Computational Approaches to Predict Protein Interaction

    Get PDF

    Predicting the Impact of Alternative Splicing on Plant MADS Domain Protein Function

    Get PDF
    Several genome-wide studies demonstrated that alternative splicing (AS) significantly increases the transcriptome complexity in plants. However, the impact of AS on the functional diversity of proteins is difficult to assess using genome-wide approaches. The availability of detailed sequence annotations for specific genes and gene families allows for a more detailed assessment of the potential effect of AS on their function. One example is the plant MADS-domain transcription factor family, members of which interact to form protein complexes that function in transcription regulation. Here, we perform an in silico analysis of the potential impact of AS on the protein-protein interaction capabilities of MIKC-type MADS-domain proteins. We first confirmed the expression of transcript isoforms resulting from predicted AS events. Expressed transcript isoforms were considered functional if they were likely to be translated and if their corresponding AS events either had an effect on predicted dimerisation motifs or occurred in regions known to be involved in multimeric complex formation, or otherwise, if their effect was conserved in different species. Nine out of twelve MIKC MADS-box genes predicted to produce multiple protein isoforms harbored putative functional AS events according to those criteria. AS events with conserved effects were only found at the borders of or within the K-box domain. We illustrate how AS can contribute to the evolution of interaction networks through an example of selective inclusion of a recently evolved interaction motif in the MADS AFFECTING FLOWERING1-3 (MAF1–3) subclade. Furthermore, we demonstrate the potential effect of an AS event in SHORT VEGETATIVE PHASE (SVP), resulting in the deletion of a short sequence stretch including a predicted interaction motif, by overexpression of the fully spliced and the alternatively spliced SVP transcripts. For most of the AS events we were able to formulate hypotheses about the potential impact on the interaction capabilities of the encoded MIKC protein

    Continuous-time modeling of cell fate determination in Arabidopsis flowers

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The genetic control of floral organ specification is currently being investigated by various approaches, both experimentally and through modeling. Models and simulations have mostly involved boolean or related methods, and so far a quantitative, continuous-time approach has not been explored.</p> <p>Results</p> <p>We propose an ordinary differential equation (ODE) model that describes the gene expression dynamics of a gene regulatory network that controls floral organ formation in the model plant <it>Arabidopsis thaliana</it>. In this model, the dimerization of MADS-box transcription factors is incorporated explicitly. The unknown parameters are estimated from (known) experimental expression data. The model is validated by simulation studies of known mutant plants.</p> <p>Conclusions</p> <p>The proposed model gives realistic predictions with respect to independent mutation data. A simulation study is carried out to predict the effects of a new type of mutation that has so far not been made in <it>Arabidopsis</it>, but that could be used as a severe test of the validity of the model. According to our predictions, the role of dimers is surprisingly important. Moreover, the functional loss of any dimer leads to one or more phenotypic alterations.</p

    Conserved and variable correlated mutations in the plant MADS protein network

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Plant MADS domain proteins are involved in a variety of developmental processes for which their ability to form various interactions is a key requisite. However, not much is known about the structure of these proteins or their complexes, whereas such knowledge would be valuable for a better understanding of their function. Here, we analyze those proteins and the complexes they form using a correlated mutation approach in combination with available structural, bioinformatics and experimental data.</p> <p>Results</p> <p>Correlated mutations are affected by several types of noise, which is difficult to disentangle from the real signal. In our analysis of the MADS domain proteins, we apply for the first time a correlated mutation analysis to a family of interacting proteins. This provides a unique way to investigate the amount of signal that is present in correlated mutations because it allows direct comparison of mutations in various family members and assessing their conservation. We show that correlated mutations in general are conserved within the various family members, and if not, the variability at the respective positions is less in the proteins in which the correlated mutation does not occur. Also, intermolecular correlated mutation signals for interacting pairs of proteins display clear overlap with other bioinformatics data, which is not the case for non-interacting protein pairs, an observation which validates the intermolecular correlated mutations. Having validated the correlated mutation results, we apply them to infer the structural organization of the MADS domain proteins.</p> <p>Conclusion</p> <p>Our analysis enables understanding of the structural organization of the MADS domain proteins, including support for predicted helices based on correlated mutation patterns, and evidence for a specific interaction site in those proteins.</p

    Sequence Motifs in MADS Transcription Factors Responsible for Specificity and Diversification of Protein-Protein Interaction

    Get PDF
    Protein sequences encompass tertiary structures and contain information about specific molecular interactions, which in turn determine biological functions of proteins. Knowledge about how protein sequences define interaction specificity is largely missing, in particular for paralogous protein families with high sequence similarity, such as the plant MADS domain transcription factor family. In comparison to the situation in mammalian species, this important family of transcription regulators has expanded enormously in plant species and contains over 100 members in the model plant species Arabidopsis thaliana. Here, we provide insight into the mechanisms that determine protein-protein interaction specificity for the Arabidopsis MADS domain transcription factor family, using an integrated computational and experimental approach. Plant MADS proteins have highly similar amino acid sequences, but their dimerization patterns vary substantially. Our computational analysis uncovered small sequence regions that explain observed differences in dimerization patterns with reasonable accuracy. Furthermore, we show the usefulness of the method for prediction of MADS domain transcription factor interaction networks in other plant species. Introduction of mutations in the predicted interaction motifs demonstrated that single amino acid mutations can have a large effect and lead to loss or gain of specific interactions. In addition, various performed bioinformatics analyses shed light on the way evolution has shaped MADS domain transcription factor interaction specificity. Identified protein-protein interaction motifs appeared to be strongly conserved among orthologs, indicating their evolutionary importance. We also provide evidence that mutations in these motifs can be a source for sub- or neo-functionalization. The analyses presented here take us a step forward in understanding protein-protein interactions and the interplay between protein sequences and network evolution

    Interactome-Wide Prediction of Protein-Protein Binding Sites Reveals Effects of Protein Sequence Variation in Arabidopsis thaliana

    Get PDF
    The specificity of protein-protein interactions is encoded in those parts of the sequence that compose the binding interface. Therefore, understanding how changes in protein sequence influence interaction specificity, and possibly the phenotype, requires knowing the location of binding sites in those sequences. However, large-scale detection of protein interfaces remains a challenge. Here, we present a sequence- and interactome-based approach to mine interaction motifs from the recently published Arabidopsis thaliana interactome. The resultant proteome-wide predictions are available via www.ab.wur.nl/sliderbio and set the stage for further investigations of protein-protein binding sites. To assess our method, we first show that, by using a priori information calculated from protein sequences, such as evolutionary conservation and residue surface accessibility, we improve the performance of interface prediction compared to using only interactome data. Next, we present evidence for the functional importance of the predicted sites, which are under stronger selective pressure than the rest of protein sequence. We also observe a tendency for compensatory mutations in the binding sites of interacting proteins. Subsequently, we interrogated the interactome data to formulate testable hypotheses for the molecular mechanisms underlying effects of protein sequence mutations. Examples include proteins relevant for various developmental processes. Finally, we observed, by analysing pairs of paralogs, a correlation between functional divergence and sequence divergence in interaction sites. This analysis suggests that large-scale prediction of binding sites can cast light on evolutionary processes that shape protein-protein interaction network

    Predicting and understanding transcription factor interactions based on sequence level determinants of combinatorial control

    No full text
    Motivation: Transcription factor interactions are the cornerstone of combinatorial control, which is a crucial aspect of the gene regulatory system. Understanding and predicting transcription factor interactions based on their sequence alone is difficult since they are often part of families of factors sharing high sequence identity. Given the scarcity of experimental data on interactions compared to available sequence data, however, it would be most useful to have accurate methods for the prediction of such interactions. Results: We present a method consisting of a Random Forestbased feature-selection procedure that selects relevant motifs out of a set found using a correlated motif search algorithm. Prediction accuracy for several transcription factor families (bZIP, MADS, homeobox and forkhead) reaches 60ΒΏ90%. In addition, we identified those parts of the sequence that are important for the interaction specificity, and show that these are in agreement with available data. We also used the predictors to perform genome-wide scans for interaction partners and recovered both known and putative new interaction partner

    Systems biology of plant molecular networks: from networks to models

    Get PDF
    Developmental processes are controlled by regulatory networks (GRNs), which are tightly coordinated networks of transcription factors (TFs) that activate and repress gene expression within a spatial and temporal context. In Arabidopsis thaliana, the key components and network structures of the GRNs controlling major plant reproduction processes, such as floral transition and floral organ identity specification, have been comprehensively unveiled. This thanks to advances in β€˜omics’ technologies combined with genetic approaches. Yet, because of the multidimensional nature of the data and because of the complexity of the regulatory mechanisms, there is a clear need to analyse these data in such a way that we can understand how TFs control complex traits. The use of mathematical modelling facilitates the representation of the dynamics of a GRN and enables better insight into GRN complexity; while multidimensional data analysis enables the identification of properties that connect different layers from genotype-to-phenotype. Mathematical modelling and multidimensional data analysis are both parts of a systems biology approach, and this thesis presents the application of both types of systems biology approaches to flowering GRNs. Chapter 1 comprehensively reviews advances in understanding of GRNs underlying plant reproduction processes, as well as mathematical models and multidimensional data analysis approaches to study plant systems biology. As discussed in Chapter 1, an important aspect of understanding these GRNs is how perturbations in one part of the network are transmitted to other parts, and ultimately how this results in changes in phenotype. Given the complexity of recent versions of Arabidopsis GRNs - which involves highly-connected, non-linear networks of TFs, microRNAs, movable factors, hormones and chromatin modifying proteins - it is not possible to predict the effect of gene perturbations on e.g. flowering time in an intuitive way by just looking at the network structure. Therefore, mathematical modelling plays an important role in providing a quantitative understanding of GRNs. In addition, aspects of multidimensional data analysis for understanding GRNs underlying plant reproduction are also discussed in the first Chapter. This includes not only the integration of experimental data, e.g. transcriptomics with protein-DNA binding profiling, but also the integration of different types of networks identified by β€˜omics’ approaches, e.g. protein-protein interaction networks and gene regulatory networks. Chapter 2 describes a mathematical model for representing the dynamics of key genes in the GRN of flowering time control. We modelled with ordinary differential equations (ODEs) the physical interactions and regulatory relationships of a set of core genes controlling Arabidopsis flowering time in order to quantitatively analyse the relationship between their expression levels and the flowering time response. We considered a core GRN composed of eight TFs: SHORT VEGETATIVE PHASE (SVP), FLOWERING LOCUS C (FLC), AGAMOUS-LIKE 24 (AGL24), SUPPRESSOR OF OVEREXPRESSION OF CONSTANS 1 (SOC1), APETALA1 (AP1), FLOWERING LOCUS T (FT), LEAFY (LFY) and FD. The connections and interactions amongst these components are justified based on experimental data, and the model is parameterised by fitting the equations to quantitative data on gene expression and flowering time. Then the model is validated with transcript data from a range of mutants. We verify that the model is able to describe some quantitative patterns seen in expression data under genetic perturbations, which supported the credibility of the model and its dynamic properties. The proposed model is able to predict the flowering time by assessing changes in the expression of the orchestrator of floral transition AP1. Overall, the work presents a framework, which allows addressing how different quantitative inputs are combined into a single quantitative output, i.e. the timing of flowering. The model allowed studying the established genetic regulations, and we discuss in Chapter 5 the steps towards using the proposed framework to zoom in and obtain new insides about the molecular mechanisms underlying the regulations. Systems biology does not only involve the use of dynamic modelling but also the development of approaches for multidimensional data analysis that are able to integrate multiple levels of systems organization. In Chapter 3, we aimed at comprehensively identifying and characterizing cis-regulatory mutations that have an effect on the GRN of flowering time control. By using ChIP-seq data and information about known DNA binding motifs of TFs involved in plant reproduction, we identified single-nucleotide polymorphisms (SNPs) that are highly discriminative in the classification of the flowering time phenotypes. Often, SNPs that overlap the position of experimentally determined binding sites (e.g. by ChIP-seq), are considered putative regulatory SNPs. We showed that regulatory SNPs are difficult to pinpoint among the sea of polymorphisms localized within binding sites determined by ChIP-seq studies. To overcome this, we narrowed the resolution by focusing on the subset of SNPs that are located within ChIP-seq peaks but that are also part of known regulatory motifs. These SNPs were used as input in a classification algorithm that could predict flowering time of Arabidopsis accessions relative to Col-0. Our strategy is able to identify SNPs that have a biological link with changes in flowering time. We then surveyed the literature to formulate hypothesis that explain the regulatory mechanism underlying the difference in phenotype conferred by a SNP. Examples include SNPs that disrupt the flowering time gene FT; in which the mutation presumably disrupts the binding region of SVP. In Chapter 5 we discuss the steps towards extending our approach to obtain a more comprehensive survey of variants that have an effect on the flowering time control. In Chapter 4, we propose a method for genome-wide prediction of protein-protein interaction (PPI) sites form the Arabidopsis interactome. Our method, named SLIDERbio, uses features encoded in the sequence of proteins and their interactions to predict PPI sites. More specifically, our method mines PPI networks to find over-represented sequence motifs in pairs of interacting proteins. In addition, the inter-species conservation of these over-represented motifs, as well as their predicted surface accessibility, are take into account to compute the likelihood of these motifs being located in a PPI site. Our results suggested that motifs overrepresented in pairs of interacting proteins that are conserved across orthologs and that have high predicted surface accessibility, are in general good putative interaction sites. We applied our method to obtain interactome-wide predictions for Arabidopsis proteins. The results were explored to formulate testable hypothesis for the molecular mechanisms underlying effects of spontaneous or induced mutagenesis on e.g. ZEITLUPE, CXIP1 and SHY2 (proteins relevant for flowering time). In addition, we showed that the binding sites are under stronger selective pressure than the overall protein sequence, and that this may be used to link sequence variability to functional divergence. Finally, Chapter 5 concludes this thesis and describes future perspectives in systems biology applied to the study of GRNs underlying plant reproduction processes. Two key directions are often followed in systems biology: 1) compiling systems-wide snapshots in which the relationships and interactions between the molecules of a system are comprehensively represented; and 2) generating accurate experimental data that can be used as input for the modelling concepts and techniques or multi-dimensional data analysis. Highlighted in Chapter 5 are the limitations in key steps within the systems biology framework applied to GRN studies. In addition, I discussed improvements and extensions that we envision for our model related to the GRN underlying the control of flowering time. Future steps for multi-dimensional data analysis are also discussed. To sum up, I discussed how to connect the different technologies developed in this thesis towards understanding the interplay between the roles of the genes, developmental stages and environmental conditions.</p
    corecore