116 research outputs found

    Identification of breast cancer associated variants that modulate transcription factor binding

    No full text
    <div><p>Genome-wide association studies (GWAS) have discovered thousands loci associated with disease risk and quantitative traits, yet most of the variants responsible for risk remain uncharacterized. The majority of GWAS-identified loci are enriched for non-coding single-nucleotide polymorphisms (SNPs) and defining the molecular mechanism of risk is challenging. Many non-coding causal SNPs are hypothesized to alter transcription factor (TF) binding sites as the mechanism by which they affect organismal phenotypes. We employed an integrative genomics approach to identify candidate TF binding motifs that confer breast cancer-specific phenotypes identified by GWAS. We performed <i>de novo</i> motif analysis of regulatory elements, analyzed evolutionary conservation of identified motifs, and assayed TF footprinting data to identify sequence elements that recruit TFs and maintain chromatin landscape in breast cancer-relevant tissue and cell lines. We identified candidate causal SNPs that are predicted to alter TF binding within breast cancer-relevant regulatory regions that are in strong linkage disequilibrium with significantly associated GWAS SNPs. We confirm that the TFs bind with predicted allele-specific preferences using CTCF ChIP-seq data. We used The Cancer Genome Atlas breast cancer patient data to identify ANKLE1 and ZNF404 as the target genes of candidate TF binding site SNPs in the 19p13.11 and 19q13.31 GWAS-identified loci. These SNPs are associated with the expression of ZNF404 and ANKLE1 in breast tissue. This integrative analysis pipeline is a general framework to identify candidate causal variants within regulatory regions and TF binding sites that confer phenotypic variation and disease risk.</p></div

    Composite footprints and directional patterns of enzyme accessibility indicate TF occupancy.

    No full text
    <p>(A) The CTCF motif exhibits one of the most striking composite footprints and directional patterns of accessibility among sequence-specific binding proteins. (B) We do not observe a composite footprint for this orphan motif; however, the enzyme accessibility pattern around this motif is directional, with higher degree of cleavage downstream from the motif.</p

    <i>De novo</i> identified motifs within open chromatin are evolutionarily conserved.

    No full text
    <p>(A) CTCF motifs within hypersensitive sites are, on average, conserved; note the peak of phastCons (blue trace) and pyhloP (red trace) intensity at the motif compared to the flanking region. The right panel is an average of 20 scrambled CTCF weight matrices. We do not observe any conservation peak after scrambling. (B) A novel orphan motif, which does not have a known cognate TF partner, is also evolutionarily conserved within open chromatin regions.</p

    Six examples of SNPs that are associated with breast cancer susceptibility and predicted to affect TF binding and gene expression regulation.

    No full text
    <p>Candidate SNPs are: 1) in strong LD (<i>r</i><sup>2</sup> ≥ 0.08) with the most associated breast cancer GWAS SNP; 2) within DNase/ATAC-seq defined regulatory region of MCF7, MCF10A, T47D, HMEC or vHMEC cell lines and tissue; 3) contain high information content (IC) in the TF binding PSWM (<i>IC</i> ≥ 0.5); and 4) are eQTLs in breast cancer patient solid tumor sample and GTEx breast tissue.</p

    Candidate causal SNP rs3760982 is one of the top eQTLs for gene ZNF404.

    No full text
    <p>(A) The A/A genotype at rs3760982 is correlated with higher expression of ZNF404 in breast cancer patient solid tumor samples. (B) GTEx data confirm that the A/A genotype at rs3760982 is correlated with higher expression of ZNF404 in breast tissue. (C) rs3760982 is one of the top eQTL SNPs of ZNF404 and rs3760982 is the most associated GWAS hit for breast cancer susceptibility.</p

    The most highly expressed TFs with paralogous DNA binding domains are most relevant to breast cancer.

    No full text
    <p>The relative expression of TFs that recognize the same sequence motif can identify the top candidate functional TFs. FOXA1 and ESR1 are the highest expressed TF in each of their TF families. We quantified gene expression using TCGA breast cancer patient solid tumor samples [<a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1006761#pgen.1006761.ref057" target="_blank">57</a>].</p

    Top SNPs from the Raine cohort GWAS that were genotyped in our GWAS of COME/ROM.

    No full text
    <p>We report our OR and allele frequency using the Risk Allele corresponding to the Risk Allele listed in the Raine cohort GWAS <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0104212#pone.0104212-Rye1" target="_blank">[9]</a>.</p><p>*Indicates SNP is intergenic and therefore reports the nearest gene.</p>∧<p>Indicates the SNP was a top genotyped SNP from a subset of Raine study participants with full covariate data.</p

    Iterative <i>de novo</i> motif analysis identified a set of 37 overrepresented motif families within the regulatory elements in MCF7, MCF10A, T47D, HMEC and vHMEC cells.

    No full text
    <p>TFs that recognize similar regulatory sequences are clustered into families. Each row contains one TF family and we denote the cell lines/tissues with each identified TF family by check mark.</p

    Reference SNP rs4414128 affects CTCF binding as measured by allele-specific ChIP-seq among many diploid, heterozygous cell lines.

    No full text
    <p>We analyzed allele-specific binding of all ENCODE cell lines with reported normal karyotype that are heterozygous at rs4414128. CTCF binding is unbalanced in favor of the C allele, which conforms more strongly to the consensus sequence.</p

    DNase-seq and ATAC-seq quantify differential open chromatin in breast cancer-relevant tissue and cell lines.

    No full text
    <p>We show smoothed DNase-seq and ATAC-seq tracks at a locus on chromosome 20. MCF7, T47D, HMEC and vHMEC DNase-seq data share several chromatin accessibility regions and MCF10A ATAC-seq data identifies a peak of differential chromatin accessibility that is distinct from the other data sets.</p
    corecore