16 research outputs found

    Erosion of Conserved Binding Sites in Personal Genomes Points to Medical Histories

    No full text
    <div><p>Although many human diseases have a genetic component involving many loci, the majority of studies are statistically underpowered to isolate the many contributing variants, raising the question of the existence of alternate processes to identify disease mutations. To address this question, we collect ancestral transcription factor binding sites disrupted by an individual’s variants and then look for their most significant congregation next to a group of functionally related genes. Strikingly, when the method is applied to five different full human genomes, the top enriched function for each is invariably reflective of their very different medical histories. For example, our method implicates “abnormal cardiac output” for a patient with a longstanding family history of heart disease, “decreased circulating sodium level” for an individual with hypertension, and other biologically appealing links for medical histories spanning narcolepsy to axonal neuropathy. Our results suggest that erosion of gene regulation by mutation load significantly contributes to observed heritable phenotypes that manifest in the medical history. The test we developed exposes a hitherto hidden layer of personal variants that promise to shed new light on human disease penetrance, expressivity and the sensitivity with which we can detect them.</p></div

    Frequency distribution of CoBELs in relation to population structure.

    No full text
    <p>(A) Principal component analysis (PCA) of the five genomes with respect to the genomes in the 1,000 genomes project, revealed clustering with the European population as expected. (B-F) Comparison of the five individuals's enrichment specific CoBEL frequencies in all 1,000 genomes data and in the two populations with which the five genomes cluster by PCA. Both this and additional frequency distribution analysis (see text) reveal that top CoBELs enrichment are composed of both common and rare variants as expected of low pathogenicity mutations that exert a noticeable effect only in aggregate. The similarity of the frequency distributions for the full 1,000 genomes and two sub-populations further suggests the lack of any population specific bias in our enrichments.</p

    Schematic of conserved binding site eroding loci method.

    No full text
    <p>(A) Method for inferring conserved binding site eroding loci (CoBELs) and hypothesizing functional consequences of erosions. (B) Conserved binding site eroding loci (CoBELs) are human reference transcription factor binding sites, conserved across multiple mammals, that are disrupted by a sequenced individual’s derived variant. Shown is a CoBEL upstream of ADRA1B contributing to the Quake genome “abnormal cardiac output” prediction in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004711#pcbi.1004711.t001" target="_blank">Table 1</a>. (C) Conserved binding site eroding loci (CoBELs) are checked for enrichment of function and the functional phenotypes are matched to medical histories via literature survey. Each step is evaluated for statistical significance (see text).</p

    Top predicted phenotype and matching medical phenotype.

    No full text
    <p>The set of conserved binding site eroding loci (CoBELs) for each individual is searched for the most significant congregation of binding site erosion events next to a group of genes sharing the same function or phenotype (see text). Per personal genome, the top row columns 2–7 describe the obtained top prediction from personal genome data and its properties. The Fold enrichment and FDR <i>q-value</i> are both reported by GREAT’s binomial enrichment test, fraction of relevant genes is the number of genes annotated for the phenotype (those listed in affected target genes) divided by all genes annotated with the phenotype. Column 8 highlights the matching personal medical phenotype. The bottom row for each personal genome spanning columns 2–7 provides exact quotes from references that confirm the link between the predicted and observed phenotypes (columns 2 and 8 for each personal genome).</p

    Enrichment distribution of hypothesized phenotypes in ‘control’ genomes.

    No full text
    <p>(A-E) Comparison of personal genome enrichments of 1,094 genomes from the 1,000 genomes project and the five genomes analyzed in this report. Dashed lines indicate GREAT’s default binomial fold (greater than or equal to two) and FDR (less than or equal to 0.05) significance thresholds. Lower left corner has the mass of genomes that were not significant by GREAT’s default hypergeometric FDR (less than or equal to 0.05). The red markers indicate an analyzed personal genome’s prediction is significant and distinguishes it from the 1,000 genomes cohort, indicating such associations do not spuriously appear at a high frequency in control individuals. Panel A indicates the enrichment of “abnormal cardiac output” is fairly common in the background 1,000 genomes cohort which is not unexpected since predisposition to mild forms of heart disease are common in otherwise normal populations.</p

    Neocortex development and evolution.

    No full text
    <p>A) A coronal plane of section through an embryo. One hemisphere is shown diagrammatically. The neocortex develops from the dorsal telencephalon. At E14.5 progenitor cells from the ventricular zone (VZ) are producing intermediate progenitor cells that migrate to form the subventricular and intermediate zones (SVZ-IZ); daughter cells from both areas migrate past the SVZ-IZ to form the cortical plate (CP), from which the neocortex develops (adapted from <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003728#pgen.1003728-Molyneaux1" target="_blank">[7]</a>). B) Absolute distance of the 6,629 p300 peaks (midpoint) to the canonical transcription start site of the nearest gene.</p

    Top GREAT enrichments for the E14.5 dorsal cerebral wall p300 ChIP-seq set.

    No full text
    <p>P-value and fold are for the binomial test. Theiler state 22 corresponds to E13.5–E15 <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003728#pgen.1003728-Kaufman1" target="_blank">[13]</a>.</p

    The Enhancer Landscape during Early Neocortical Development Reveals Patterns of Dense Regulation and Co-option

    Get PDF
    <div><p>Genetic studies have identified a core set of transcription factors and target genes that control the development of the neocortex, the region of the human brain responsible for higher cognition. The specific regulatory interactions between these factors, many key upstream and downstream genes, and the enhancers that mediate all these interactions remain mostly uncharacterized. We perform p300 ChIP-seq to identify over 6,600 candidate enhancers active in the dorsal cerebral wall of embryonic day 14.5 (E14.5) mice. Over 95% of the peaks we measure are conserved to human. Eight of ten (80%) candidates tested using mouse transgenesis drive activity in restricted laminar patterns within the neocortex. GREAT based computational analysis reveals highly significant correlation with genes expressed at E14.5 in key areas for neocortex development, and allows the grouping of enhancers by known biological functions and pathways for further studies. We find that multiple genes are flanked by dozens of candidate enhancers each, including well-known key neocortical genes as well as suspected and novel genes. Nearly a quarter of our candidate enhancers are conserved well beyond mammals. Human and zebrafish regions orthologous to our candidate enhancers are shown to most often function in other aspects of central nervous system development. Finally, we find strong evidence that specific interspersed repeat families have contributed potentially key developmental enhancers via co-option. Our analysis expands the methodologies available for extracting the richness of information found in genome-wide functional maps.</p></div

    The ten genes most enriched for the abundance of p300 peaks in their GREAT gene regulatory domains.

    No full text
    <p>In two cases, a gene desert rich in p300 regions is flanked by two poorly studied genes in the context of neocortical development. By examining gene function and expression, our literature support points to the flanking gene more likely regulated by the peaks.</p>*<p>Shown in <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003728#pgen-1003728-g004" target="_blank">Figure 4</a>.</p
    corecore