1,996 research outputs found

    Local sequence features that influence AP-1 cis-regulatory activity

    Get PDF
    In the genome, most occurrences of transcription factor binding sites (TFBS) have no cis-regulatory activity, which suggests that flanking sequences contain information that distinguishes functional from nonfunctional TFBS. We interrogated the role of flanking sequences near Activator Protein 1 (AP-1) binding sites that reside in DNase I Hypersensitive Sites (DHS) and regions annotated as Enhancers. In these regions, we found that sequence features directly adjacent to the core motif distinguish high from low activity AP-1 sites. Some nearby features are motifs for other TFs that genetically interact with the AP-1 site. Other features are extensions of the AP-1 core motif, which cause the extended sites to match motifs of multiple AP-1 binding proteins. Computational models trained on these data distinguish between sequences with high and low activity AP-1 sites and also predict changes in cis-regulatory activity due to mutations in AP-1 core sites and their flanking sequences. Our results suggest that extended AP-1 binding sites, together with adjacent binding sites for additional TFs, encode part of the information that governs TFBS activity in the genome.</jats:p

    PTRE-seq reveals mechanism and interactions of RNA binding proteins and miRNAs

    Get PDF
    A large number of RNA binding proteins (RBPs) and miRNAs bind to the 3′ untranslated regions of mRNA, but methods to dissect their function and interactions are lacking. Here the authors introduce post-transcriptional regulatory element sequencing (PTRE-seq) to dissect sequence preferences, interactions and consequences of RBP and miRNA binding

    Single nucleotide variants in transcription factors associate more tightly with phenotype than with gene expression

    Get PDF
    Mapping the polymorphisms responsible for variation in gene expression, known as Expression Quantitative Trait Loci (eQTL), is a common strategy for investigating the molecular basis of disease. Despite numerous eQTL studies, the relationship between the explanatory power of variants on gene expression versus their power to explain ultimate phenotypes remains to be clarified. We addressed this question using four naturally occurring Quantitative Trait Nucleotides (QTN) in three transcription factors that affect sporulation efficiency in wild strains of the yeast, Saccharomyces cerevisiae. We compared the ability of these QTN to explain the variation in both gene expression and sporulation efficiency. We find that the amount of gene expression variation explained by the sporulation QTN is not predictive of the amount of phenotypic variation explained. The QTN are responsible for 98% of the phenotypic variation in our strains but the median gene expression variation explained is only 49%. The alleles that are responsible for most of the variation in sporulation efficiency do not explain most of the variation in gene expression. The balance between the main effects and gene-gene interactions on gene expression variation is not the same as on sporulation efficiency. Finally, we show that nucleotide variants in the same transcription factor explain the expression variation of different sets of target genes depending on whether the variant alters the level or activity of the transcription factor. Our results suggest that a subset of gene expression changes may be more predictive of ultimate phenotypes than the number of genes affected or the total fraction of variation in gene expression variation explained by causative variants, and that the downstream phenotype is buffered against variation in the gene expression network

    Causal variation in yeast sporulation tends to reside in a pathway bottleneck

    Get PDF
    There has been extensive debate over whether certain classes of genes are more likely than others to contain the causal variants responsible for phenotypic differences in complex traits between individuals. One hypothesis states that input/output genes positioned in signal transduction bottlenecks are more likely than other genes to contain causal natural variation. The IME1 gene resides at such a signaling bottleneck in the yeast sporulation pathway, suggesting that it may be more likely to contain causal variation than other genes in the sporulation pathway. Through crosses between natural isolates of yeast, we demonstrate that the specific causal nucleotides responsible for differences in sporulation efficiencies reside not only in IME1 but also in the genes that surround IME1 in the signaling pathway, including RME1, RSF1, RIM15, and RIM101. Our results support the hypothesis that genes at the critical decision making points in signaling cascades will be enriched for causal variants responsible for phenotypic differences

    A quantitative metric of pioneer activity reveals that HNF4A has stronger in vivo pioneer activity than FOXA1

    Get PDF
    BACKGROUND: We and others have suggested that pioneer activity - a transcription factor\u27s (TF\u27s) ability to bind and open inaccessible loci - is not a qualitative trait limited to a select class of pioneer TFs. We hypothesize that most TFs display pioneering activity that depends on the TF concentration and the motif content at their target loci. RESULTS: Here, we present a quantitative in vivo measure of pioneer activity that captures the relative difference in a TF\u27s ability to bind accessible versus inaccessible DNA. The metric is based on experiments that use CUT&Tag to measure the binding of doxycycline-inducible TFs. For each location across the genome, we determine the concentration of doxycycline required for a TF to reach half-maximal occupancy; lower concentrations reflect higher affinity. We propose that the relative difference in a TF\u27s affinity between ATAC-seq labeled accessible and inaccessible binding sites is a measure of its pioneer activity. We estimate binding affinities at tens of thousands of genomic loci for the endodermal TFs FOXA1 and HNF4A and show that HNF4A has stronger pioneer activity than FOXA1. We show that both FOXA1 and HNF4A display higher binding affinity at inaccessible sites with more copies of their respective motifs. The quantitative analysis of binding suggests different modes of binding for FOXA1, including an anti-cooperative mode of binding at certain accessible loci. CONCLUSIONS: Our results suggest that relative binding affinities are reasonable measures of pioneer activity and support the model wherein most TFs have some degree of context-dependent pioneer activity

    Discrimination between thermodynamic models of cis-regulation using transcription factor occupancy data

    Get PDF
    Many studies have identified binding preferences for transcription factors (TFs), but few have yielded predictive models of how combinations of transcription factor binding sites generate specific levels of gene expression. Synthetic promoters have emerged as powerful tools for generating quantitative data to parameterize models of combinatorial cis-regulation. We sought to improve the accuracy of such models by quantifying the occupancy of TFs on synthetic promoters in vivo and incorporating these data into statistical thermodynamic models of cis-regulation. Using chromatin immunoprecipitation-seq, we measured the occupancy of Gcn4 and Cbf1 in synthetic promoter libraries composed of binding sites for Gcn4, Cbf1, Met31/Met32 and Nrg1. We measured the occupancy of these two TFs and the expression levels of all promoters in two growth conditions. Models parameterized using only expression data predicted expression but failed to identify several interactions between TFs. In contrast, models parameterized with occupancy and expression data predicted expression data, and also revealed Gcn4 self-cooperativity and a negative interaction between Gcn4 and Nrg1. Occupancy data also allowed us to distinguish between competing regulatory mechanisms for the factor Gcn4. Our framework for combining occupancy and expression data produces predictive models that better reflect the mechanisms underlying combinatorial cis-regulation of gene expression

    Phylogeny based discovery of regulatory elements

    Get PDF
    BACKGROUND: Algorithms that locate evolutionarily conserved sequences have become powerful tools for finding functional DNA elements, including transcription factor binding sites; however, most methods do not take advantage of an explicit model for the constrained evolution of functional DNA sequences. RESULTS: We developed a probabilistic framework that combines an HKY85 model, which assigns probabilities to different base substitutions between species, and weight matrix models of transcription factor binding sites, which describe the probabilities of observing particular nucleotides at specific positions in the binding site. The method incorporates the phylogenies of the species under consideration and takes into account the position specific variation of transcription factor binding sites. Using our framework we assessed the suitability of alignments of genomic sequences from commonly used species as substrates for comparative genomic approaches to regulatory motif finding. We then applied this technique to Saccharomyces cerevisiae and related species by examining all possible six base pair DNA sequences (hexamers) and identifying sequences that are conserved in a significant number of promoters. By combining similar conserved hexamers we reconstructed known cis-regulatory motifs and made predictions of previously unidentified motifs. We tested one prediction experimentally, finding it to be a regulatory element involved in the transcriptional response to glucose. CONCLUSION: The experimental validation of a regulatory element prediction missed by other large-scale motif finding studies demonstrates that our approach is a useful addition to the current suite of tools for finding regulatory motifs

    A cis-regulatory logic simulator

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A major goal of computational studies of gene regulation is to accurately predict the expression of genes based on the cis-regulatory content of their promoters. The development of computational methods to decode the interactions among cis-regulatory elements has been slow, in part, because it is difficult to know, without extensive experimental validation, whether a particular method identifies the correct cis-regulatory interactions that underlie a given set of expression data. There is an urgent need for test expression data in which the interactions among cis-regulatory sites that produce the data are known. The ability to rapidly generate such data sets would facilitate the development and comparison of computational methods that predict gene expression patterns from promoter sequence.</p> <p>Results</p> <p>We developed a gene expression simulator which generates expression data using user-defined interactions between cis-regulatory sites. The simulator can incorporate additive, cooperative, competitive, and synergistic interactions between regulatory elements. Constraints on the spacing, distance, and orientation of regulatory elements and their interactions may also be defined and Gaussian noise can be added to the expression values. The simulator allows for a data transformation that simulates the sigmoid shape of expression levels from real promoters. We found good agreement between sets of simulated promoters and predicted regulatory modules from real expression data. We present several data sets that may be useful for testing new methodologies for predicting gene expression from promoter sequence.</p> <p>Conclusion</p> <p>We developed a flexible gene expression simulator that rapidly generates large numbers of simulated promoters and their corresponding transcriptional output based on specified interactions between cis-regulatory sites. When appropriate rule sets are used, the data generated by our simulator faithfully reproduces experimentally derived data sets. We anticipate that using simulated gene expression data sets will facilitate the direct comparison of computational strategies to predict gene expression from promoter sequence. The source code is available online and as additional material. The test sets are available as additional material.</p

    Massively parallel synthetic promoter assays reveal the in vivo effects of binding site variants

    Get PDF
    Gene promoters typically contain multiple transcription factor binding sites (TFBSs), which may vary in affinity for their cognate transcription factors (TFs). One major challenge in studying cis-regulation is to understand how TFBS variants affect gene expression. We studied the in vivo effects of TFBS variants on cis-regulation using synthetic promoters coupled with a thermodynamic model of TF binding. We measured expression driven by each promoter with RNA-seq of transcribed sequence barcodes. This allowed reporter genes to be highly multiplexed and increased our statistical power to detect the effects of TFBS variants. We analyzed the effects of TFBS variants using a thermodynamic framework that models both TF-DNA interactions and TF-TF interactions. We found that this system accurately estimates the in vivo relative affinities of TFBSs and predicts unexpected interactions between several TFBSs. Our results reveal that binding site variants can have complex effects on gene expression due to differences in TFBS affinity for cognate TFs and differences in TFBS specificity for noncognate TFs

    A genome-integrated massively parallel reporter assay reveals DNA sequence determinants of cis-regulatory activity in neural cells

    Get PDF
    Recent large-scale genomics efforts to characterize the cis-regulatory sequences that orchestrate genome-wide expression patterns have produced impressive catalogues of putative regulatory elements. Most of these sequences have not been functionally tested, and our limited understanding of the non-coding genome prevents us from predicting which sequences are bona fide cis-regulatory elements. Recently, massively parallel reporter assays (MPRAs) have been deployed to measure the activity of putative cis-regulatory sequences in several biological contexts, each with specific advantages and distinct limitations. We developed LV-MPRA, a novel lentiviral-based, massively parallel reporter gene assay, to study the function of genome-integrated regulatory elements in any mammalian cell type; thus, making it possible to apply MPRAs in more biologically relevant contexts. We measured the activity of 2,600 sequences in U87 glioblastoma cells and human neural progenitor cells (hNPCs) and explored how regulatory activity is encoded in DNA sequence. We demonstrate that LV-MPRA can be applied to estimate the effects of local DNA sequence and regional chromatin on regulatory activity. Our data reveal that primary DNA sequence features, such as GC content and dinucleotide composition, accurately distinguish sequences with high activity from sequences with low activity in a full chromosomal context, and may also function in combination with different transcription factor binding sites to determine cell type specificity. We conclude that LV-MPRA will be an important tool for identifying cis-regulatory elements and stimulating new understanding about how the non-coding genome encodes information
    • …