19 research outputs found

    Digital Genotyping of Macrosatellites and Multicopy Genes Reveals Novel Biological Functions Associated with Copy Number Variation of Large Tandem Repeats

    No full text
    <div><p>Tandem repeats are common in eukaryotic genomes, but due to difficulties in assaying them remain poorly studied. Here, we demonstrate the utility of Nanostring technology as a targeted approach to perform accurate measurement of tandem repeats even at extremely high copy number, and apply this technology to genotype 165 HapMap samples from three different populations and five species of non-human primates. We observed extreme variability in copy number of tandemly repeated genes, with many loci showing 5–10 fold variation in copy number among humans. Many of these loci show hallmarks of genome assembly errors, and the true copy number of many large tandem repeats is significantly under-represented even in the high quality ‘finished’ human reference assembly. Importantly, we demonstrate that most large tandem repeat variations are not tagged by nearby SNPs, and are therefore essentially invisible to SNP-based GWAS approaches. Using association analysis we identify many <i>cis</i> correlations of large tandem repeat variants with nearby gene expression and DNA methylation levels, indicating that variations of tandem repeat length are associated with functional effects on the local genomic environment. This includes an example where expansion of a macrosatellite repeat is associated with increased DNA methylation and suppression of nearby gene expression, suggesting a mechanism termed “repeat induced gene silencing”, which has previously been observed only in transgenic organisms. We also observed multiple signatures consistent with altered selective pressures at tandemly repeated loci, suggesting important biological functions. Our studies show that tandemly repeated loci represent a highly variable fraction of the genome that have been systematically ignored by most previous studies, copy number variation of which can exert functionally significant effects. We suggest that future studies of tandem repeat loci will lead to many novel insights into their role in modulating both genomic and phenotypic diversity.</p></div

    Mutation Screening of Candidate Genes in Patients with Nonsyndromic Sagittal Craniosynostosis

    No full text
    International audienceBackground-Craniosynostosis is a condition that includes the premature fusion of one or multiple cranial sutures. Among various craniosynostosis forms, midline sagittal nonsyndromic craniosynostosis (sNSC) is the most prevalent. Although different gene mutations have been identified in some craniosynostosis syndromes, the etiology of sNSC remains largely unknown.Methods: To screen for candidate genes for sagittal nonsyndromic craniosynostosis, the authors sequenced DNA of 93 sagittal nonsyndromic craniosynostosis patients from a population-based study conducted in Iowa and New York states. FGFR1-3 mutational hotspots and the entire TWIST1, RAB23, and BMP2 coding regions were screened because of their known roles in human nonsyndromic or syndromic sagittal craniosynostosis, expression patterns, and/or animal model studies.Results: The authors identified two rare variants in their cohort. A FGFR1 insertion c.730_731insG, which led to a premature stop codon, was predicted to abolish the entire immunoglobulin-like III domain, including the ligand-binding region. A c.439C>G variant was observed in TWIST1 at its highly conserved loop domain in another patient. The patient’s mother harbored the same variant and was reported with jaw abnormalities. These two variants were not detected in 116 alleles from unaffected controls or seen in the several databases; however, TWIST1 variant was found in a low frequency of 0.000831 percent in Exome Aggregation Consortium database.Conclusions: The low mutation detection rate indicates that these genes account for only a small proportion of sagittal nonsyndromic craniosynostosis patients. The authors’ results add to the perception that sagittal nonsyndromic craniosynostosis is a complex developmental defect with considerable genetic heterogeneity

    Polymorphic tandem repeats within gene promoters act as modifiers of gene expression and DNA methylation in humans

    No full text
    Despite representing an important source of genetic variation, tandem repeats (TRs) remain poorly studied due to technical difficulties. We hypothesized that TRs can operate as expression (eQTLs) and methylation (mQTLs) quantitative trait loci. To test this we analyzed the effect of variation at 4849 promoter-associated TRs, genotyped in 120 individuals, on neighboring gene expression and DNA methylation. Polymorphic promoter TRs were associated with increased variance in local gene expression and DNA methylation, suggesting functional consequences related to TR variation. We identified >100 TRs associated with expression/methylation levels of adjacent genes. These potential eQTL/mQTL TRs were enriched for overlaps with transcription factor binding and DNaseI hypersensitivity sites, providing a rationale for their effects. Moreover, we showed that most TR variants are poorly tagged by nearby single nucleotide polymorphisms (SNPs) markers, indicating that many functional TR variants are not effectively assayed by SNP-based approaches. Our study assigns biological significance to TR variations in the human genome, and suggests that a significant fraction of TR variations exert functional effects via alterations of local gene expression or epigenetics. We conclude that targeted studies that focus on genotyping TR variants are required to fully ascertain functional variation in the genome

    Multicopy genes show evidence of altered selective pressures on amino acid sequence during recent primate evolution.

    No full text
    <p>Density plots showing the distribution of dN/dS ratios for multicopy genes (<i>green</i>) compared to all RefSeq genes (<i>red</i>) for human versus chimpanzee. There is a significant enrichment for elevated rates of non-synonymous substitution in multicopy genes versus the genome average (p = 3.3×10<sup>−7</sup>, Kolmogorov-Smirnov test). This excess of non-synonymous amino-acid changes in recent primate evolution at multicopy genes is consistent with either reduced selective constraint and/or selection for proteins with altered function. Similar results are obtained when comparing human with orangutan and macaque (<a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1004418#pgen.1004418.s005" target="_blank">Figure S5</a>).</p

    High frequency of population stratification for CNV of multicopy genes.

    No full text
    <p>17 of 116 (14.7%) multicopy genes show high levels of differentiation in copy number (V<sub>st</sub>>0.2) among European, African and Asian populations. Note that probe counts on the y-axis are shown on a log2 scale.</p

    Association of MSat10 copy number with neighboring gene expression and epigenetic marks.

    No full text
    <p>(<b>a</b>) MSat10 is a 5.2 kb GC-rich tandem repeat that lies ∼4 kb distal to the gene <i>ZFP37</i>. Although 6 copies of this 5.2 kb repeat are present in the hg18 assembly this macrosatellite is highly polymorphic in size, varying from 4–42 copies in HapMap. ChIP-seq analysis shows the presence of histone marks characteristic of heterochromatin, such as trimethylation of histone H3 at lysine 9 and trimethylation of histone H4 at lysine 9. Screenshot from the UCSC Genome Browser shows <i>ZFP37</i> (Zinc Finger Protein 37), the adjacent MSat10 repeat (<i>red arrows</i>), and the results of ChIP-seq analysis. (<b>b</b>) In 58 unrelated CEU HapMap individuals we observed an inverse correlation between copy number of the MSat10 repeat and expression level of the adjacent gene <i>ZFP37</i>, demonstrating suppression of <i>ZFP37</i> expression associated with larger repeat sizes (<b>c</b>) Using a targeted Sequenom assay, we confirm that variable methylation of MSat10 is highly correlated with repeat number (R<sup>2</sup> = 0.76, p = 4.4×10<sup>−12</sup>), showing a strong relationship between repeat size and local epigenetic state. (<b>d</b>) Proposed model of repeat induced gene silencing at the MSat10 locus. At low repeat numbers the region is euchromatic and the expression of the neighboring <i>ZFP37</i> gene is high. However, expansions of the macrosatellite result in an accumulation of heterochromatic marks in the region, including repressive histone modifications and DNA methylation, resulting in the suppression of local gene expression. Although our model shows methylation on all MSat10 copies, our data does not exclude the possibility that on expanded MSat10 alleles DNA methylation is limited to a subset of the repeat units. Lollipops represent DNA methylation, with open circles being low and filled black circles high DNA methylation, and grey ‘Me’ bubbles represent repressive histone methylation.</p

    Variation in copy number of tandem repeats and multicopy genes is associated with alterations of local DNA methylation.

    No full text
    <p>(a and c) Shown are correlation values between copy number of (a) MSat10 and (c) <i>CCL4</i> with all methylation probes within ±500 kb in 118 CEU and YRI HapMap individuals. (b and d) Scatter plots showing individual level data for the methylation probes showing the strongest correlations with copy number of Msat10 and <i>CCL4</i>. (b) Increasing copy number of MSat10 is associated with increased methylation levels of cg14316660, (R = 0.63, permutation p<0.001). This association was replicated using a Sequenom assay targeted to the MSat10 locus (<a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1004418#pgen-1004418-g005" target="_blank">Figure 5c</a>), confirming that it is not simply due to a technical artifact related to CNV of the underlying probe binding sites. (d) Increasing copy number of <i>CCL4</i> is associated with reduced methylation levels of cg11728928 (R = −0.59, permutation p<0.001). In (a) and (c), black bars indicate the interval to which each Nanostring probe maps, CpGs showing correlation p<0.01 are indicated in blue, while the CpG showing the strongest correlation is shown as a filled blue circle and labeled with a grey arrow (with individual data plotted in (b) and (d), respectively).</p

    Most multicopy genes show very low levels of linkage disequilibrium with nearby SNPs.

    No full text
    <p>Correlation analysis for each of the 121 polymorphic probes targeting multicopy genes and macrosatellites with SNP markers within ±250 kb yielded a median R<sup>2</sup> = 0.18 between the highest ranked filtered SNP and probe count. Only 3 of 116 (∼3%) multicopy genes showed an R<sup>2</sup>≥0.8 with any SNP in the three populations studied. Therefore the vast majority of tandem repeat variations lack informative tag SNPs, and thus association studies of multicopy loci require specific genotyping of each locus to gain accurate copy number information of these regions.</p

    <i>REXO1L1</i> and <i>TCEB3C</i> show extreme variation in copy number among primate species.

    No full text
    <p>(<b>a</b>) <i>REXO1L1</i> is one of the most extreme examples of copy number variable genes in human, with 108–266 copies of the ∼12.2 kb repeat unit observed in the 165 HapMap individuals studied. However even more extreme variation is observed among different primates. We observed ∼450 and ∼550 copies in bonobo and chimpanzee, respectively, and copy numbers of ∼400 and ∼860 in two different gorilla individuals. In contrast while macaque has an estimated 22 copies, gibbon falls within the same range seen in human. (<b>b</b>) While <i>TCEB3C</i> ranges from 9–59 copies among HapMap individuals (mean 29 copies), all five species of primate studied show increased copy number, indicating a reduction of <i>TCEB3C</i> copy number specifically in the human lineage. As with <i>REXO1L1</i>, gorilla and chimpanzee showed the highest copy numbers, with 115 in chimpanzee and ∼270 copies in both gorillas studied.</p
    corecore