47 research outputs found

    Example of determination of parental origin and parent-of-origin specific association testing for a hypothetical SNP.

    No full text
    <p>(A) For the given SNP (alleles <i>A</i> and <i>B</i>), the homozygous mother can only contribute allele <i>A</i> to her offspring, which means that the child's <i>A</i> allele is maternally inherited, and child's <i>B</i> allele must therefore be paternally inherited. Maternal (<sub>MAT</sub>) and paternal (<sub>PAT</sub>) alleles are shown in pink and blue, respectively. (B) Example data showing annotation of parental origin of this SNP in 10 individuals. (C) <i>cis</i>-association study with SNPs within ±1 MB around the gene. (D) Comparison of standard eQTL study and imprinted eQTL study (see <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0041695#s4" target="_blank">Methods</a> for details).</p

    Comparison of separate association using maternal and paternal alleles with the Likelihood Ratio Test (LRT) for putative 30 ieQTLs.

    No full text
    <p>(A) Correlation between –log<sub>10</sub> p-values generated using a standard association test and the LRT (r<sup>2</sup> = −0.87) (B) Correlation between -log<sub>10</sub> p-values generated using the heterozygote test and the LRT (r<sup>2</sup> = 0.71). Note that 6 of the 30 SNP-gene pairs did not have a sufficient number of heterozygotes to perform the test.</p

    Detection of Parent-of-Origin Specific Expression Quantitative Trait Loci by <em>Cis</em>-Association Analysis of Gene Expression in Trios

    Get PDF
    <div><p>Parent-of-origin (PofO) effects, such as imprinting are a phenomenon in which homologous chromosomes exhibit differential gene expression and epigenetic modifications according to their parental origin. Such non-Mendelian inheritance patterns are generally ignored by conventional association studies, as these tests consider the maternal and paternal alleles as equivalent. To identify regulatory regions that show PofO effects on gene expression (imprinted expression Quantitative Trait Loci, ieQTLs), here we have developed a novel method in which we associate SNP genotypes of defined parental origin with gene expression levels. We applied this method to study 59 HapMap phase II parent-offspring trios. By analyzing mother/father/child trios, rules of Mendelian inheritance allowed the parental origin to be defined for ∼95% of SNPs in each child. We used 680,475 informative SNPs and corresponding expression data for 92,167 probe sets from Affymetrix GeneChip Human Exon 1.0 ST arrays and performed four independent <em>cis</em>-association analyses with the expression level of RefSeq genes within 1 Mb using PLINK. Independent analyses of maternal and paternal genotypes identified two significant <em>cis</em>-ieQTLs (p<10<sup>−7</sup>) at which expression of genes <em>SFT2D2</em> and <em>SRRT</em> associated exclusively with maternally inherited SNPs rs3753292 and rs6945374, respectively. 28 additional suggestive <em>cis</em>-associations with only maternal or paternal SNPs were found at a lower stringency threshold of p<10<sup>−6</sup>, including associations with two known imprinted genes <em>PEG10</em> and <em>TRAPPC9</em>, demonstrating the efficacy of our method. Furthermore, comparison of our method that utilizes independent analyses of maternal and paternal genotypes with the Likelihood Ratio Test (LRT) showed it to be more effective for detecting imprinting effects than the LRT. Our method represents a novel approach that can identify imprinted regulatory elements that control gene expression, suggesting novel PofO effects in the human genome.</p> </div

    Multiple strong confounders contribute to artifactual associations between CNVs and hypomethylation.

    No full text
    <p>(a) Hypomethylated regions of the human genome are highly enriched for satellite repeats. We observed a strong enrichment for satellite repeats in regions of the genome <1<sup>st</sup> percentile of mean methylation level. Satellites comprise a mean of 16.6% of the hypomethylated windows, compared to only 0.26% in the rest of the genome (∼64-fold enrichment, <i>p</i> = 1.4×10<sup>−29</sup>, Mann-Whitney Rank Sum Test). Previous analysis has shown that satellites tend to be strongly hypomethylated in human sperm <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003332#pgen.1003332-Molaro1" target="_blank">[10]</a>. Furthermore, given their highly repetitive and dynamic nature, loci rich in satellites are enriched for CNVs (51.7% of windows containing satellites overlap HapMap CNVs <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003332#pgen.1003332-Conrad1" target="_blank">[7]</a> compared to 20.5% in the rest of the genome), creating an inherent confounder between CNVs and hypomethylation. (b) No enrichment for CNVs in hypomethylated regions after removal of confounding genomic features. Li et al. reported significant enrichments for overlap with multiple CNV datasets in “methylation deserts” (those with the lowest 1% mean methylation) and regions of the genome with MI = 0 <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003332#pgen.1003332-Li1" target="_blank">[9]</a>. However, after excluding regions of extreme repeat content (all windows containing satellite repeats, and those >99<sup>th</sup> percentile by LINE, SINE, LTR, and total repeat content, <i>n</i> = 1,716), and/or windows in which only a minority of CpGs were sampled (<i>n</i> = 430), all reported CNV enrichments reduce significantly and in most cases disappear entirely. Dashed grey line represents equal prevalence of CNVs between hypomethylated regions compared with the rest of the genome. (c) Bisulfite reads within “methylation deserts” preferentially map to CpG islands/shores. We observed that windows scored as “methylation deserts” by Li et al. (those with the lowest 1% mean methylation) show a strong bias for bisulfite reads to be mapped within ±2 kb of CGIs. As CGIs, especially those associated with the promoters of expressed genes, are typically unmethylated, this creates an underestimate of the mean methylation value in the wider region. Data shown represent fraction of CpGs per window with at least one overlapping read that map within ±2 kb of CGIs, after first excluding all windows containing satellite repeats, or those >99<sup>th</sup> percentile based on LINE, SINE, LTR, or total repeat content. (d) A huge reduction in SNP density in windows with MI = 0. We observed a massively reduced density of HapMap SNPs in windows with MI = 0 (mean, 25; median, 13) compared to the genome average (mean, 143; median, 137). As mSNPs represent only 8.2% of all SNPs in the genome and the formula used by Li et al. to calculate MI reports MI = 0 when no mSNPs are present, the use of a methylation index based on SNP content is inherently biased to score windows containing only a small number of SNPs as MI = 0. Because of stringent quality filtering, ∼98% of HapMap SNP assays map uniquely within the genome <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003332#pgen.1003332-The3" target="_blank">[20]</a>. Therefore, a significant negative correlation exists between SNP density and segmental duplications (<i>r</i> = −0.337, <i>p</i><10<sup>−323</sup>), a fraction of the genome that is highly enriched for structural variation <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003332#pgen.1003332-Sharp2" target="_blank">[2]</a>, <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003332#pgen.1003332-Tuzun1" target="_blank">[3]</a>, <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003332#pgen.1003332-Conrad1" target="_blank">[7]</a>. (e) No enrichment for CNVs in regions with MI = 0 after removal of windows with low SNP density. Li et al. reported that windows with MI = 0 are enriched for CNVs identified in several different studies <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003332#pgen.1003332-Li1" target="_blank">[9]</a>. However, power calculations (<a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003332#pgen.1003332.s004" target="_blank">Figure S4</a>) show that at least 28 SNPs per window are required to achieve a <10% false discovery rate for MI = 0. After excluding windows containing <28 SNPs (<i>n</i> = 811), all enrichments for CNVs in the remaining regions with MI = 0 disappear, indicating that the conclusions of Li et al. are likely artifactual resulting from low SNP density in many CNV regions.</p

    Comment on “Genomic Hypomethylation in the Human Germline Associates with Selective Structural Mutability in the Human Genome”

    Get PDF
    Comment on “Genomic Hypomethylation in the Human Germline Associates with Selective Structural Mutability in the Human Genome

    Global assessment of methylation levels and confounders contributing to hypomethylation in common CNV regions.

    No full text
    <p>(a) Mean methylation levels and (b) mean CpG density per base within and flanking 5,360 nonredundant HapMap CNVs. To directly assess the relationship between DNA methylation and structural variation, we used published 15× bisulfite sequencing data <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003332#pgen.1003332-Molaro1" target="_blank">[10]</a> to calculate mean methylation per base both within and flanking a high-quality set of HapMap CNVs <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003332#pgen.1003332-Conrad1" target="_blank">[7]</a>. We first merged 8,599 CNVs defined by Conrad into 6,142 nonredundant regions, and then removed those <20 kb in size to form a filtered set of 5,360 nonredundant regions (mean size, 3,789 bp). A 100 kb window was then centered on the midpoint of each CNV, and mean methylation levels and CpG count per base in these 100 kb windows were calculated using 15× sperm bisulfite sequencing data <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003332#pgen.1003332-Molaro1" target="_blank">[10]</a>. Each plot shows a 100 bp moving average. Although a small decrease in methylation level is evident within CNVs compared to flanking regions, overall mean methylation levels within CNV regions (69%) are very similar to the genome average (70%). Furthermore this dip in methylation corresponds precisely with an increase in CpG density and an enrichment for CGIs within CNVs. As most CGIs are unmethylated in sperm <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003332#pgen.1003332-Molaro1" target="_blank">[10]</a>, <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003332#pgen.1003332-Weber1" target="_blank">[17]</a>, this fact likely accounts for the small overall reduction in methylation levels associated with CNVs. (c) Regions classified as “methylation deserts” by Li et al. represent an extremely nonrandom subset of the genome that is highly enriched for common repeats and preferential mapping of bisulfite reads to CpG islands. We classified all 100 kb windows defined by Li et al. based on their content of common repeats and fraction of CpGs assayed that map within ±2 kb of CGIs. One hundred and eighty-three of the 285 (64%) windows that were classified as “methylation deserts” by Li et al. are >95<sup>th</sup> percentile based on satellite, LINE, or LTR content and/or the 99<sup>th</sup> percentile based on total repeat content. A further 80 windows (28%) are >95<sup>th</sup> percentile based on the fraction of CpGs assayed within them that map to CGIs or shores. Overall, only 22 of 285 (8%) windows defined by Li et al. as “methylation deserts” do not show extremes of repeat content or highly biased sampling of CpG islands. In contrast, in the rest of the genome, 84% of windows do not overlap any of these categories. Furthermore, windows that overlap a high-quality dataset of HapMap CNVs <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003332#pgen.1003332-Conrad1" target="_blank">[7]</a> show a repeat content and proportion of reads mapping to CGIs similar to the genome average. Thus, the set of regions defined as “methylation deserts” by Li et al. represent an extreme fraction of the genome that is likely to be highly enriched for unusual epigenetic and structural features.</p

    The interplay between the genome and the methylome.

    No full text
    <p>A) Methylated cytosines tend to deaminate over evolutionary time and, thus, the methylation state of cytosines in different species influences the evolution of the underlying genome sequence. B) Species-specific nucleotide changes that disrupt transcription factor (TF) binding sites can alter the methylation state of nearby CpG dinucleotides and, as a consequence, establish species-specific differentially methylated regions (DMRs). C) The insertion of transposable elements in a particular lineage, along with the accumulation of nucleotide changes, can lead to the emergence of novel CpG dinucleotides, creating species-specific regulatory regions.</p

    Digital Genotyping of Macrosatellites and Multicopy Genes Reveals Novel Biological Functions Associated with Copy Number Variation of Large Tandem Repeats

    No full text
    <div><p>Tandem repeats are common in eukaryotic genomes, but due to difficulties in assaying them remain poorly studied. Here, we demonstrate the utility of Nanostring technology as a targeted approach to perform accurate measurement of tandem repeats even at extremely high copy number, and apply this technology to genotype 165 HapMap samples from three different populations and five species of non-human primates. We observed extreme variability in copy number of tandemly repeated genes, with many loci showing 5–10 fold variation in copy number among humans. Many of these loci show hallmarks of genome assembly errors, and the true copy number of many large tandem repeats is significantly under-represented even in the high quality ‘finished’ human reference assembly. Importantly, we demonstrate that most large tandem repeat variations are not tagged by nearby SNPs, and are therefore essentially invisible to SNP-based GWAS approaches. Using association analysis we identify many <i>cis</i> correlations of large tandem repeat variants with nearby gene expression and DNA methylation levels, indicating that variations of tandem repeat length are associated with functional effects on the local genomic environment. This includes an example where expansion of a macrosatellite repeat is associated with increased DNA methylation and suppression of nearby gene expression, suggesting a mechanism termed “repeat induced gene silencing”, which has previously been observed only in transgenic organisms. We also observed multiple signatures consistent with altered selective pressures at tandemly repeated loci, suggesting important biological functions. Our studies show that tandemly repeated loci represent a highly variable fraction of the genome that have been systematically ignored by most previous studies, copy number variation of which can exert functionally significant effects. We suggest that future studies of tandem repeat loci will lead to many novel insights into their role in modulating both genomic and phenotypic diversity.</p></div

    Multicopy genes show evidence of altered selective pressures on amino acid sequence during recent primate evolution.

    No full text
    <p>Density plots showing the distribution of dN/dS ratios for multicopy genes (<i>green</i>) compared to all RefSeq genes (<i>red</i>) for human versus chimpanzee. There is a significant enrichment for elevated rates of non-synonymous substitution in multicopy genes versus the genome average (p = 3.3×10<sup>−7</sup>, Kolmogorov-Smirnov test). This excess of non-synonymous amino-acid changes in recent primate evolution at multicopy genes is consistent with either reduced selective constraint and/or selection for proteins with altered function. Similar results are obtained when comparing human with orangutan and macaque (<a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1004418#pgen.1004418.s005" target="_blank">Figure S5</a>).</p

    <i>REXO1L1</i> and <i>TCEB3C</i> show extreme variation in copy number among primate species.

    No full text
    <p>(<b>a</b>) <i>REXO1L1</i> is one of the most extreme examples of copy number variable genes in human, with 108–266 copies of the ∼12.2 kb repeat unit observed in the 165 HapMap individuals studied. However even more extreme variation is observed among different primates. We observed ∼450 and ∼550 copies in bonobo and chimpanzee, respectively, and copy numbers of ∼400 and ∼860 in two different gorilla individuals. In contrast while macaque has an estimated 22 copies, gibbon falls within the same range seen in human. (<b>b</b>) While <i>TCEB3C</i> ranges from 9–59 copies among HapMap individuals (mean 29 copies), all five species of primate studied show increased copy number, indicating a reduction of <i>TCEB3C</i> copy number specifically in the human lineage. As with <i>REXO1L1</i>, gorilla and chimpanzee showed the highest copy numbers, with 115 in chimpanzee and ∼270 copies in both gorillas studied.</p
    corecore