18 research outputs found

    TherMos: Estimating protein-DNA binding energies from in vivo binding profiles

    Get PDF
    Accurately characterizing transcription factor (TF)-DNA affinity is a central goal of regulatory genomics. Although thermodynamics provides the most natural language for describing the continuous range of TF-DNA affinity, traditional motif discovery algorithms focus instead on classification paradigms that aim to discriminate 'bound' and 'unbound' sequences. Moreover, these algorithms do not directly model the distribution of tags in ChIP-seq data. Here, we present a new algorithm named Thermodynamic Modeling of ChIP-seq (TherMos), which directly estimates a positionspecific binding energy matrix (PSEM) from ChIPseq/exo tag profiles. In cross-validation tests on seven genome-wide TF-DNA binding profiles, one of which we generated via ChIP-seq on a complex developing tissue, TherMos predicted quantitative TF-DNA binding with greater accuracy than five well-known algorithms. We experimentally validated TherMos binding energy models for Klf4 and Esrrb, using a novel protocol to measure PSEMs in vitro. Strikingly, our measurements revealed strong nonadditivity at multiple positions within the two PSEMs. Among the algorithms tested, only TherMos was able to model the entire binding energy landscape of Klf4 and Esrrb. Our study reveals new insights into the energetics of TF-DNA binding in vivo and provides an accurate first-principles approach to binding energy inference from ChIP-seq and ChIP-exo data. © 2013 The Author(s).Link_to_subscribed_fulltex

    A Genome-Wide Screen for Genetic Variants That Modify the Recruitment of REST to Its Target Genes

    Get PDF
    Increasing numbers of human diseases are being linked to genetic variants, but our understanding of the mechanistic links leading from DNA sequence to disease phenotype is limited. The majority of disease-causing nucleotide variants fall within the non-protein-coding portion of the genome, making it likely that they act by altering gene regulatory sequences. We hypothesised that SNPs within the binding sites of the transcriptional repressor REST alter the degree of repression of target genes. Given that changes in the effective concentration of REST contribute to several pathologies—various cancers, Huntington's disease, cardiac hypertrophy, vascular smooth muscle proliferation—these SNPs should alter disease-susceptibility in carriers. We devised a strategy to identify SNPs that affect the recruitment of REST to target genes through the alteration of its DNA recognition element, the RE1. A multi-step screen combining genetic, genomic, and experimental filters yielded 56 polymorphic RE1 sequences with robust and statistically significant differences of affinity between alleles. These SNPs have a considerable effect on the the functional recruitment of REST to DNA in a range of in vitro, reporter gene, and in vivo analyses. Furthermore, we observe allele-specific biases in deeply sequenced chromatin immunoprecipitation data, consistent with predicted differenes in RE1 affinity. Amongst the targets of polymorphic RE1 elements are important disease genes including NPPA, PTPRT, and CDH4. Thus, considerable genetic variation exists in the DNA motifs that connect gene regulatory networks. Recently available ChIP–seq data allow the annotation of human genetic polymorphisms with regulatory information to generate prior hypotheses about their disease-causing mechanism

    Co-motif discovery identifies an esrrb-Sox2-DNA ternary complex as a mediator of transcriptional differences between mouse embryonic and epiblast stem cells

    No full text
    Transcription factors (TF) often bind in heterodimeric complexes with each TF recognizing a specific neighboring cis element in the regulatory region of the genome. Comprehension of this DNA motif grammar is opaque, yet recent developments have allowed the interrogation of genome- wide TF binding sites. We reasoned that within this data novel motif grammars could be identified that controlled distinct biological programs. For this purpose, we developed a novel motif-discovery tool termed fexcom that systematically interrogates ChIP-seq data to discover spatially constrained TF-TF composite motifs occurring over short DNA distances. We applied this to the extensive ChIP-seq data available from mouse embryonic stem cells (ESCs). In addition to the well-known and most prevalent sox-oct motif, we also discovered a novel constrained spacer motif for Esrrb and Sox2 with a gap of between 2 and 8 bps that Essrb and Sox2 cobind in a selective fashion. Through the use of knockdown experiments, we argue that the Esrrb-Sox2 complex is an arbiter of gene expression differences between ESCs and epiblast stem cells (EpiSC). A number of genes downregulated upon dual Esrrb/Sox2 knockdown (e.g., Klf4, Klf5, Jam2, Pecam1) are similarly downregulated in the ESC to EpiSC transition and contain the esrrb-sox motif. The prototypical Esrrb-Sox2 target gene, containing an esrrbsox element conserved throughout eutherian and metatherian mammals, is Nr0b1. Through positive regulation of this transcriptional repressor, we argue the Esrrb- Sox2 complex promotes the ESC state through inhibition of the EpiSC transcriptional program and the same trio may also function to maintain trophoblast stem cells. © 2012 AlphaMed Press.Link_to_subscribed_fulltex

    SNPs may increase or decrease affinity of RE1s.

    No full text
    <p>Examples of SNPs that decrease (A) and increase (B) the affinity of an RE1 sequence. On the left are diagrams of the genomic location of polymorphic RE1s, their target genes and the REST ChIPseq read density taken from ENCODE data. On the right is corresponding quantitative EMSA data. In (A), the well-studied RE1 that lies within the 3′ UTR of the <i>NPPA</i> gene contains the SNP rs12565 that strongly decreases its affinity for REST. In (B), the RE1 lying distally upstream of the <i>CDH4</i> gene contains the SNP rs6093022 that strongly increases its affinity. (C) The SNP rs1040480 within an intron of <i>PTPRT</i> reduces the affinity of REST. B = Bound complex of REST with probe; U = Unbound probe; D = Degradation product. The latter band represents a fraction of purified REST protein that is partially degraded, as was observed previously <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1002624#pgen.1002624-Johnson4" target="_blank">[20]</a>.</p

    Electrophoretic mobility shift assay to measure affinity differences between RE1 alleles.

    No full text
    <p>To measure RE1 affinity in vitro, we employed a competition EMSA method. We tested the ability of unlabelled competitor sequences to compete for REST binding with a fluorescently labelled DNA probe. (A) Various control oligonucleotides were used to validate the sensitivity and selectivity of the comparative EMSA assay. The Ideal RE1 motif is a high affinity synthetic sequence we used previously <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1002624#pgen.1002624-Johnson4" target="_blank">[20]</a>. By swapping two highly conserved dinucleotides in the sequence, the affinity of the Ideal RE1 can be completely abolished (Mutated RE1). We also designed four pairs of RE1 alleles (N1-4), where SNPs lie outside the RE1 half sites, and thus would not be expected to alter binding affinity. (B) The results of control EMSAs are shown, where replicate EMSA gels have been quantitated and plotted. The data are displayed in units of Fraction Bound (see <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1002624#s4" target="_blank">Materials and Methods</a>), where a low Fraction Bound value indicates high binding affinity, and vice versa. Example raw data are shown in panels: (C) Ideal/Mutated RE1s and (D) N1 RE1s. (E) Summary results of EMSA for all pRE1s in this study. The y-axis plots the difference in Fraction Bound between Major and Minor alleles, where the arrow begins at the value for Major, and ends at Minor. All RE1s are ranked by their change in Fraction Bound.</p

    Polymorphic RE1s where the minor allele has increased affinity.

    No full text
    <p>FB: Change in Fraction Bound value, FB−FB; Frequency indicates Hapmap populations where the Minor SNP allele occurs at 5% (ND-Not determined, -5% in all populations)(Note: genotype data come from Hapmap, except for genotyping carried out on Hapmap CEU set in this study, denoted CEU*); Dist: Distance from RE1 to gene transcriptional start site (negative indicate upstream). Known REST target genes are underlined - see <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1002624#pgen.1002624.s005" target="_blank">File S1</a> for more information.</p

    Experimental pipeline to discover SNPs that affect gene repression by REST.

    No full text
    <p>(A) The structure of the RE1 motif, illustrating its two strongly constrained half sites and weakly constrained spacer and 3′ regions. The spacer region may have “canonical” size of two nucleotides, or other “non-canonical” sizes. (B) Cartoon illustrating the hypothetical effect of a SNP in an RE1 element. In the upper panel, the Major (ie more frequent) allele contains a high-affinity RE1 sequence that strongly recruits REST, resulting in target gene repression. The presence of the SNP reduces REST binding affinity, and results in an increase in target gene transcription. (C) The flowchart illustrates the pipeline employed in this study to discover pRE1 SNPs.</p

    Allele-specific recruitment to pRE1s in GM12878 cells.

    No full text
    <p>The data shown corresponds to the 5 heterozygous SNPs discovered in GM12878 cells. In all cases blue indicates the Major allele, and red the Minor allele. EMSA data is shown in left panel (Note that Fraction Bound units correlate inversely with binding affinity), allele-specific ChIPseq read density from the ENCODE project is shown in the central panel, and allele-specific ChIP enrichment (where available) is shown in the right panel. Statistical significance was calculated using Student's <i>t</i> test (EMSA, allele-specific ChIP) and Binomial statistics (ChIPseq).</p
    corecore