23 research outputs found

    Role of antisense RNAs in evolution of yeast regulatory complexity

    Get PDF
    AbstractAntisense RNAs (asRNAs) are known to regulate gene expression. However, a genome-wide mechanism of asRNA regulation is unclear, and there is no good explanation why partial asRNAs are not functional. To explore its regulatory role, we investigated asRNAs using an evolutionary approach, as genome-wide experimental data are limited. We found that the percentage of genes coupling with asRNAs in Saccharomyces cerevisiae is negatively associated with regulatory complexity and evolutionary age. Nevertheless, asRNAs evolve more slowly when their sense genes are under more complex regulation. Older genes coupling with asRNAs are more likely to demonstrate inverse expression, reflecting the role of these asRNAs as repressors. Our analyses provide novel evidence, suggesting a minor contribution of asRNAs in developing regulatory complexity. Although our results support the leaky hypothesis for asRNA transcription, our evidence also suggests that partial asRNAs may have evolved as repressors. Our study deepens the understanding of asRNA regulatory evolution

    Borders of Cis-Regulatory DNA Sequences Preferentially Harbor the Divergent Transcription Factor Binding Motifs in the Human Genome

    Get PDF
    Changes in cis-regulatory DNA sequences and transcription factor (TF) repertoires provide major sources of phenotypic diversity that shape the evolution of gene regulation in eukaryotes. The DNA-binding specificities of TFs may be diversified or produce new variants in different eukaryotic species. However, it is currently unclear how various levels of divergence in TF DNA-binding specificities or motifs became introduced into the cis-regulatory DNA regions of the genome over evolutionary time. Here, we first estimated the evolutionary divergence levels of TF binding motifs and quantified their occurrence at DNase I-hypersensitive sites. Results from our in silico motif scan and experimentally derived chromatin immunoprecipitation (TF-ChIP) show that the divergent motifs tend to be introduced in the edges of cis-regulatory regions, which is probably accompanied by the expansion of the accessible core of promoter-associated regulatory elements during evolution. We also find that the genes neighboring the expanded cis-regulatory regions with the most divergent motifs are associated with functions like development and morphogenesis. Accordingly, we propose that the accumulation of divergent motifs in the edges of cis-regulatory regions provides a functional mechanism for the evolution of divergent regulatory circuits

    Impact of DNA-binding position variants on yeast gene expression

    Get PDF
    Transcription factors (TFs) regulate gene expression by binding to specific binding sites (TFBSs) in gene promoters. TFBS motifs may contain one or more variable positions. Although the prevailing assumption is that nucleotide variants at such positions are functionally equivalent, there is increasing evidence that such variants play a role in regulation of gene expression. In this article, we propose a method for studying the relationship between the expression of target genes and nucleotide variants in TFBS motifs at a genome-wide scale in Saccharomyces cerevisiae, especially the combinatorial effects of variants at two positions. Our analysis shows that nucleotide variations in more than one-third of variable positions and in 20% of dependent position pairs are highly correlated to gene expression. We define such positions as ā€˜functionalā€™. However, some positions are only functional as dependent pairs, but not individually. In addition, a significant proportion of the functional positions have been well conserved across all yeast-related species studied. We also find that some positions require the presence of co-occurring TFs, while others maintain their functionality in the absence of a co-occurring TF. Our analysis supports the importance of nucleotide variants at variable positions of TFBSs in gene regulation

    Comparative genomic analyses highlight the contribution of pseudogenized protein-coding genes to human lincRNAs

    No full text
    Abstract Background The regulatory roles of long intergenic noncoding RNAs (lincRNAs) in humans have been revealed through the use of advanced sequencing technology. Recently, three possible scenarios of lincRNA origins have been proposed: de novo origination from intergenic regions, duplication from other long noncoding RNAs, and pseudogenization from protein-coding genes. The first two scenarios are largely studied and supported, yet few studies focused on the evolution from pseudogenized protein-coding sequence to lincRNA. Due to the non-mutually exclusive nature of these three scenarios and the need of systematic investigation of lincRNA origination, we conducted a comparative genomics study to investigate the evolution of human lincRNAs. Results Combining with syntenic analysis and stringent Blastn e-value cutoff, we found that the majority of lincRNAs are aligned to intergenic regions of other species. Interestingly, 193 human lincRNAs could have protein-coding orthologs in at least two of nine vertebrates. Transposable elements in these conserved regions in human genome are much less than expectation. Moreover, 19% of these lincRNAs have overlaps with or are close to pseudogenes in the human genome. Conclusions We suggest that a notable portion of lincRNAs could be derived from pseudogenized protein-coding genes. Furthermore, based on our computational analysis, we hypothesize that a subset of these lincRNAs could have potential to regulate their paralogs by functioning as competing endogenous RNAs. Our results provide evolutionary evidence of the relationship between human lincRNAs and protein-coding genes

    Contribution of Sequence Motif, Chromatin State, and DNA Structure Features to Predictive Models of Transcription Factor Binding in Yeast.

    No full text
    Transcription factor (TF) binding is determined by the presence of specific sequence motifs (SM) and chromatin accessibility, where the latter is influenced by both chromatin state (CS) and DNA structure (DS) properties. Although SM, CS, and DS have been used to predict TF binding sites, a predictive model that jointly considers CS and DS has not been developed to predict either TF-specific binding or general binding properties of TFs. Using budding yeast as model, we found that machine learning classifiers trained with either CS or DS features alone perform better in predicting TF-specific binding compared to SM-based classifiers. In addition, simultaneously considering CS and DS further improves the accuracy of the TF binding predictions, indicating the highly complementary nature of these two properties. The contributions of SM, CS, and DS features to binding site predictions differ greatly between TFs, allowing TF-specific predictions and potentially reflecting different TF binding mechanisms. In addition, a "TF-agnostic" predictive model based on three DNA "intrinsic properties" (in silico predicted nucleosome occupancy, major groove geometry, and dinucleotide free energy) that can be calculated from genomic sequences alone has performance that rivals the model incorporating experiment-derived data. This intrinsic property model allows prediction of binding regions not only across TFs, but also across DNA-binding domain families with distinct structural folds. Furthermore, these predicted binding regions can help identify TF binding sites that have a significant impact on target gene expression. Because the intrinsic property model allows prediction of binding regions across DNA-binding domain families, it is TF agnostic and likely describes general binding potential of TFs. Thus, our findings suggest that it is feasible to establish a TF agnostic model for identifying functional regulatory regions in potentially any sequenced genome

    Contribution of Sequence Motif, Chromatin State, and DNA Structure Features to Predictive Models of Transcription Factor Binding in Yeast

    No full text
    <div><p>Transcription factor (TF) binding is determined by the presence of specific sequence motifs (SM) and chromatin accessibility, where the latter is influenced by both chromatin state (CS) and DNA structure (DS) properties. Although SM, CS, and DS have been used to predict TF binding sites, a predictive model that jointly considers CS and DS has not been developed to predict either TF-specific binding or general binding properties of TFs. Using budding yeast as model, we found that machine learning classifiers trained with either CS or DS features alone perform better in predicting TF-specific binding compared to SM-based classifiers. In addition, simultaneously considering CS and DS further improves the accuracy of the TF binding predictions, indicating the highly complementary nature of these two properties. The contributions of SM, CS, and DS features to binding site predictions differ greatly between TFs, allowing TF-specific predictions and potentially reflecting different TF binding mechanisms. In addition, a "TF-agnostic" predictive model based on three DNA ā€œintrinsic propertiesā€ (<i>in silico</i> predicted nucleosome occupancy, major groove geometry, and dinucleotide free energy) that can be calculated from genomic sequences alone has performance that rivals the model incorporating experiment-derived data. This intrinsic property model allows prediction of binding regions not only across TFs, but also across DNA-binding domain families with distinct structural folds. Furthermore, these predicted binding regions can help identify TF binding sites that have a significant impact on target gene expression. Because the intrinsic property model allows prediction of binding regions across DNA-binding domain families, it is TF agnostic and likely describes general binding potential of TFs. Thus, our findings suggest that it is feasible to establish a TF agnostic model for identifying functional regulatory regions in potentially any sequenced genome.</p></div

    The relative importance of features for predicting binding regions of different TFs.

    No full text
    <p>Importance was defined as the decrease in accuracy after dropping a feature. The accuracy range was normalized to [0, 1] for each TF, where 0 is blue and 1 is red. The TFs were grouped into three classes as shown in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004418#pcbi.1004418.g003" target="_blank">Fig 3</a>. Arrowheads indicate the most important features for predicting binding regions for most TFs.</p

    Expression patterns of genes with accessible motif occurrences are more highly correlated than those with inaccessible motifs.

    No full text
    <p>1. The number of time points in the time-series gene expression experiment.</p><p>2. The significance of difference between two correlation coefficient distributions (within-group pairwise correlations of bound and unbound sets) by one-sided KS test.</p><p>Expression patterns of genes with accessible motif occurrences are more highly correlated than those with inaccessible motifs.</p

    Evaluation of features distinguishing between bound and unbound regions and between regions bound by a single TF compared to the other TFs.

    No full text
    <p><i>(A</i>,<i>B)</i> The <i>p</i>-values (color scale shown, adjusted by false discovery rate control for multiple testing) from two-sided Wilcoxon rank sum tests of differences in feature values (<i>A</i>) between bound and unbound regions of all the 40 analyzed TFs jointly (ALL) and separately, and <i>(B)</i> between bound regions of a single vs. the remaining TFs. The <i>p</i>-values for (<i>A</i>) and <i>(B)</i> are shown in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004418#pcbi.1004418.s001" target="_blank">S1 Fig</a> and <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004418#pcbi.1004418.s002" target="_blank">S2 Fig</a>, respectively. <i>(C</i>,<i>D)</i> The value distributions of the 23 features for regions bound (black) and not bound (white) by (<i>C</i>) RAP1 and (<i>D</i>) ZAP1, respectively. The values were normalized into [0, 1] for each feature. The <i>p</i>-values of two-tailed Wilcoxon rank sum tests are shown below the boxplots: red, <i>p</i> < 10<sup>āˆ’3</sup>; white, <i>p</i> = 10<sup>āˆ’3</sup>; blue, <i>p</i> > 10<sup>āˆ’3</sup>.</p

    Performance improvement in binding region prediction models by incorporating chromatin state (CS) and DNA structure (DS) features.

    No full text
    <p>(<i>A</i>,<i>C</i>) The relationship between binding region prediction performance of models using sequence motif (SM) only and SM+CS+DS for each TF when contrasting <i>(A)</i> bound and unbound regions of a TF and <i>(C)</i> regions bound by one TF compared to regions bound by. the other TFs. The triangle indicates the average performance. The line indicates 1-to-1 relationship. (<i>B</i>,<i>D</i>) The relationship between the improvement in F-measure when incorporating CS and DS and the F-measures of random forest classifications using SM-only when contrasting <i>(B)</i> bound and unbound regions of a TF and <i>(D)</i> regions bound by one TF compared to regions bound by the other TFs.</p