5 research outputs found

    Power of existing measures to predict the effect of regulatory polymorphism.

    No full text
    <p>ROC curve evaluating the power of two measures on the 1,368 SNPs in this study found within the region of protein-DNA contact, 186 of which significantly affect occupancy. Dotted blue line represents predictions by ranking SNPs in decreasing order of inferred purifying selection (phyloP per-nucleotide conservation score) at the location of the SNP. Solid red line represents predictions by ranking SNPs based on the difference in log-odds scores between alleles. Area under the curve (AUC) summarizes overall predictive power. Gray line indicates a random predictor and has an AUC of 50%. A perfect predictor would be plotted as a right angle, ranking all functional SNPs ahead of all nonfunctional SNPs, and would have an AUC of 1.0. While per-nt conservation performs little better than chance, consideration of binding energetics substantially improves performance.</p

    Sequence context buffers effect of polymorphism on occupancy.

    No full text
    <p>(A) Average effect of SNPs on occupancy across 1,368 different sites, broken down by genotypes (panels) and position (x-axis) relative to the canonical motif (top). Y-axis, proportion of sites where a change is associated with differences in occupancy (FDR 1%). In comparison, 1% of changes observed outside this 44 bp region affected binding (<a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1002599#pgen.1002599.s011" target="_blank">Table S4</a>). Only changes observed at least 3 sites are considered; in particular, few A–T transversions were observed due to the GC-rich nature of the motif. (B) SNPs at the weakest and strongest sites are less likely to affect occupancy. X-axis, decile of ChIP-seq signal for the heterozygote genotype according to the regression model; each decile represents 583 sites. Y-axis, proportion of sites in at which SNPs are associated with differential occupancy. (C) SNPs affecting occupancy despite stronger motif contexts involve more severe perturbations. X-axis, log-odds score of motif match, stronger matches at the right, label represents lower limit of bin. Y-axis, magnitude of perturbation, represented by the difference in log-odds scores between the two alleles. Error bars indicate standard deviation. In contrast, SNPs not affecting occupancy show no such trend (<a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1002599#pgen.1002599.s016" target="_blank">Table S9</a>). (D) Each cell measures the mutual information between the base pair at positions in the core motif (x-axis) and whether a SNP at another position in the motif (y-axis) affects occupancy (FDR 5%). (E) Sequence context at sites with SNPs (arrows) at position 1 (above), 6 (below), divided by whether the SNP affected occupancy. Red stars highlight significant sequence differences (q<0.05, see <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1002599#s3" target="_blank">Materials and Methods</a>) between buffered and unbuffered sites at positions with elevated mutual information along the x-axis in (D).</p

    Systematic identification of the effect of genetic variation on transcription factor occupancy.

    No full text
    <p>(A) We performed ChIP-seq for the transcription factor CTCF followed by targeted resequencing of its complete occupancy landscape in 12 members of CEPH pedigree 1459 (CEU). (B) Three qualitative levels of occupancy correspond to three genotypes of a SNP located at the binding site, with G/G homozygotes having the highest occupancy (region shown: chr1:151,853,500–151,859,700 [hg18]). (C) The SNP shown in (B) disrupts a critical position in the CTCF consensus sequence (note that G better matches the consensus recognition sequence). (D) Regression of ChIP-seq signal on genotype at the site in (B) quantifies the effect of SNPs on occupancy. We applied this strategy genome-wide to identify sites where SNPs are associated with differences in occupancy. At this site, Akaike information criterion favored a dominant effect model (GT and GG coded identically) over an additive model.</p

    Genome-wide survey of the effect of genetic variation.

    No full text
    <p>(A) Filtering strategy for testable CTCF binding sites. A number of binding sites were excluded from the analysis due to microarray probe design constraints, poor mappability, differing mappability between two alleles, or insufficient resequencing coverage. (B) Summary of the prevalence of SNPs that affect CTCF occupancy at an FDR of 1%. Some sites overlapping SNPs were excluded for having insufficient data points per genotype to perform a robust regression. The model explained a substantial amount of the variance at significant sites (median r<sup>2</sup> of 0.61).</p

    Functional SNPs recapitulate the CTCF binding motif.

    No full text
    <p>(A) 4,428 SNPs identified by resequencing at as many sites. Y-axis indicates the number of SNPs identified at a given position (x-axis) relative to the aligned and strand-oriented CTCF motif (below). Bar color indicates alleles of SNPs. Gray shading indicates the 44-bp extent of protein-DNA interaction. Note that SNPs are uniformly distributed throughout the entire window, except for a slight reduction in diversity corresponding to the high-information content positions of the motif. (B) Of the SNPs in (A), 218 are significantly associated with ChIP-seq occupancy (FDR 1%). Color indicates SNPs for which the higher-occupancy allele (according to association analysis) also had a higher log-odds score in the known motif. Gray indicates SNPs that affected occupancy, but the higher-occupancy allele had a lower score in the motif. See <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1002599#pgen.1002599.s002" target="_blank">Figure S2C</a> for full color. Note that these SNPs are concentrated in the region of protein-DNA contact, and 84% match the allele predicted by the canonical motif (above).</p
    corecore