19 research outputs found

    Machine learning for regulatory analysis and transcription factor target prediction in yeast

    Get PDF
    High throughput technologies, including array-based chromatin immunoprecipitation, have rapidly increased our knowledge of transcriptional maps—the identity and location of regulatory binding sites within genomes. Still, the full identification of sites, even in lower eukaryotes, remains largely incomplete. In this paper we develop a supervised learning approach to site identification using support vector machines (SVMs) to combine 26 different data types. A comparison with the standard approach to site identification using position specific scoring matrices (PSSMs) for a set of 104 Saccharomyces cerevisiae regulators indicates that our SVM-based target classification is more sensitive (73 vs. 20%) when specificity and positive predictive value are the same. We have applied our SVM classifier for each transcriptional regulator to all promoters in the yeast genome to obtain thousands of new targets, which are currently being analyzed and refined to limit the risk of classifier over-fitting. For the purpose of illustration we discuss several results, including biochemical pathway predictions for Gcn4 and Rap1. For both transcription factors SVM predictions match well with the known biology of control mechanisms, and possible new roles for these factors are suggested, such as a function for Rap1 in regulating fermentative growth. We also examine the promoter melting temperature curves for the targets of YJR060W, and show that targets of this TF have potentially unique physical properties which distinguish them from other genes. The SVM output automatically provides the means to rank dataset features to identify important biological elements. We use this property to rank classifying k-mers, thereby reconstructing known binding sites for several TFs, and to rank expression experiments, determining the conditions under which Fhl1, the factor responsible for expression of ribosomal protein genes, is active. We can see that targets of Fhl1 are differentially expressed in the chosen conditions as compared to the expression of average and negative set genes. SVM-based classifiers provide a robust framework for analysis of regulatory networks. Processing of classifier outputs can provide high quality predictions and biological insight into functions of particular transcription factors. Future work on this method will focus on increasing the accuracy and quality of predictions using feature reduction and clustering strategies. Since predictions have been made on only 104 TFs in yeast, new classifiers will be built for the remaining 100 factors which have available binding data

    Estimating the evidence of selection and the reliability of inference in unigenic evolution

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Unigenic evolution is a large-scale mutagenesis experiment used to identify residues that are potentially important for protein function. Both currently-used methods for the analysis of unigenic evolution data analyze 'windows' of contiguous sites, a strategy that increases statistical power but incorrectly assumes that functionally-critical sites are contiguous. In addition, both methods require the questionable assumption of asymptotically-large sample size due to the presumption of approximate normality.</p> <p>Results</p> <p>We develop a novel approach, termed the Evidence of Selection (EoS), removing the assumption that functionally important sites are adjacent in sequence and and explicitly modelling the effects of limited sample-size. Precise statistical derivations show that the EoS score can be easily interpreted as an expected log-odds-ratio between two competing hypotheses, namely, the hypothetical presence or absence of functional selection for a given site. Using the EoS score, we then develop selection criteria by which functionally-important yet non-adjacent sites can be identified. An approximate power analysis is also developed to estimate the reliability of inference given the data. We validate and demonstrate the the practical utility of our method by analysis of the homing endonuclease <monospace>I-Bmol</monospace>, comparing our predictions with the results of existing methods.</p> <p>Conclusions</p> <p>Our method is able to assess both the evidence of selection at individual amino acid sites and estimate the reliability of those inferences. Experimental validation with <monospace>I-Bmol</monospace> proves its utility to identify functionally-important residues of poorly characterized proteins, demonstrating increased sensitivity over previous methods without loss of specificity. With the ability to guide the selection of precise experimental mutagenesis conditions, our method helps make unigenic analysis a more broadly applicable technique with which to probe protein function.</p> <p>Availability</p> <p>Software to compute, plot, and summarize EoS data is available as an open-source package called 'unigenic' for the 'R' programming language at <url>http://www.fernandes.org/txp/article/13/an-analytical-framework-for-unigenic-evolution</url>.</p

    Rap1p requires Gcr1p and Gcr2p homodimers to activate ribosomal protein and glycolytic genes, respectively.

    No full text
    Efficient transcription of ribosomal protein (RP) and glycolytic genes requires the Rap1p/Gcr1p regulatory complex. A third factor, Gcr2p, is required for only the glycolytic (specialized) mode of transcriptional activation. It is recruited to the complex by Gcr1p and likely mediates a change in the phosphorylation state and/or conformation of the latter. We show here that leucine zipper motifs in Gcr1p and Gcr2p (1LZ and 2LZ) are each specific to one of the two activation mechanisms-mutations in 1LZ and 2LZ impair transcription of RP and glycolytic genes, respectively. Although neither class of mutations causes more than a mild growth defect, simultaneous impairment of 1LZ and 2LZ results in a severe synthetic defect and a reduction in the expression of both sets of genes. Intracistronic complementation by point mutations in the charged e and g positions confirmed that Gcr1p/Gcr1p and Gcr2p/Gcr2p homodimers are the forms required for the different roles of the activator complex. Direct heterodimerization between 1LZ and 2LZ apparently does not occur. Dichotomous Rap1p activation and its striking requirement for distinct homodimeric subunits give cells the capacity to switch between coordinated and uncoupled RP and glycolytic gene regulation

    Specialized Rap1p/Gcr1p Transcriptional Activation through Gcr1p DNA Contacts Requires Gcr2p, as Does Hyperphosphorylation of Gcr1p

    No full text
    The multifunctional regulatory factor Rap1p of Saccharomyces cerevisiae accomplishes one of its tasks, transcriptional activation, by complexing with Gcr1p. An unusual feature of this heteromeric complex is its apparent capacity to contact simultaneously two adjacent DNA elements (UAS(RPG) and the CT box, bound specifically by Rap1p and Gcr1p, respectively). The complex can activate transcription through isolated UAS(RPG) but not CT elements. In promoters that contain both DNA signals its activity is enhanced, provided the helical spacing between the two elements is appropriate; this suggests that at least transient DNA loop formation is involved. We show here that this CT box-dependent augmentation of Rap1p/Gcr1p activation requires the presence of a third protein Gcr2p; the Gcr2(-) growth defect appears to result from a genome-wide loss of the CT box effect. Interestingly, a hyperphosphorylated form of Gcr1p disappears in Δgcr2 cells but reappears if they harbor a doubly point-mutated GCR1 allele that bypasses the Gcr2(-) growth defect. Gcr2p therefore appears to induce a conformation change in Gcr1p and/or stimulate its hyperphosphorylation; one or both of these effects can be mimicked in the absence of GCR2 by mutation of GCR1. This improved view of Rap1p/Gcr1p/Gcr2p function reveals a new aspect of eukaryotic gene regulation: modification of an upstream activator, accompanied by at least transient DNA loop formation, mediates its improved capacity to activate transcription

    The Tor and PKA signaling pathways independently target the Atg1/Atg13 protein kinase complex to control autophagy

    No full text
    Macroautophagy (or autophagy) is a conserved degradative pathway that has been implicated in a number of biological processes, including organismal aging, innate immunity, and the progression of human cancers. This pathway was initially identified as a cellular response to nutrient deprivation and is essential for cell survival during these periods of starvation. Autophagy is highly regulated and is under the control of a number of signaling pathways, including the Tor pathway, that coordinate cell growth with nutrient availability. These pathways appear to target a complex of proteins that contains the Atg1 protein kinase. The data here show that autophagy in Saccharomyces cerevisiae is also controlled by the cAMP-dependent protein kinase (PKA) pathway. Elevated levels of PKA activity inhibited autophagy and inactivation of the PKA pathway was sufficient to induce a robust autophagy response. We show that in addition to Atg1, PKA directly phosphorylates Atg13, a conserved regulator of Atg1 kinase activity. This phosphorylation regulates Atg13 localization to the preautophagosomal structure, the nucleation site from which autophagy pathway transport intermediates are formed. Atg13 is also phosphorylated in a Tor-dependent manner, but these modifications appear to occur at positions distinct from the PKA phosphorylation sites identified here. In all, our data indicate that the PKA and Tor pathways function independently to control autophagy in S. cerevisiae, and that the Atg1/Atg13 kinase complex is a key site of signal integration within this degradative pathway

    An evolutionary proteomics approach identifies substrates of the cAMP-dependent protein kinase

    No full text
    Protein kinases are important mediators of much of the signal transduction that occurs in eukaryotic cells. Unfortunately, the identification of protein kinase substrates has proven to be a difficult task, and we generally know few, if any, of the physiologically relevant targets of any particular kinase. Here, we describe a sequence-based approach that simplified this substrate identification process for the cAMP-dependent protein kinase (PKA) in Saccharomyces cerevisiae. In this method, the evolutionary conservation of all PKA consensus sites in the S. cerevisiae proteome was systematically assessed within a group of related yeasts. The basic premise was that a higher degree of conservation would identify those sites that are functional in vivo. This method identified 44 candidate PKA substrates, 5 of which had been described. A phosphorylation analysis showed that all of the identified candidates were phosphorylated by PKA and that the likelihood of phosphorylation was strongly correlated with the degree of target site conservation. Finally, as proof of principle, the activity of one particular target, Atg1, a key regulator of autophagy, was shown to be controlled by PKA phosphorylation in vivo. These data therefore suggest that this evolutionary proteomics approach identified a number of PKA substrates that had not been uncovered by other methods. Moreover, these data show how this approach could be generally used to identify the physiologically relevant occurrences of any protein motif identified in a eukaryotic proteome
    corecore