105 research outputs found

    Duplication events downstream of IRX1 cause North Carolina macular dystrophy at the MCDR3 locus

    Get PDF
    Autosomal dominant North Carolina macular dystrophy (NCMD) is believed to represent a failure of macular development. The disorder has been linked to two loci, MCDR1 (chromosome 6q16) and MCDR3 (chromosome 5p15-p13). Recently, non-coding variants upstream of PRDM13 (MCDR1) and a duplication including IRX1 (MCDR3) have been identified. However, the underlying disease-causing mechanism remains uncertain. Through a combination of sequencing studies on eighteen NCMD families, we report two novel overlapping duplications at the MCDR3 locus, in a gene desert downstream of IRX1 and upstream of ADAMTS16. One duplication of 43 kb was identified in nine families (with evidence for a shared ancestral haplotype), and another one of 45 kb was found in a single family. Three families carry the previously reported V2 variant (MCDR1), while five remain unsolved. The MCDR3 locus is thus refined to a shared region of 39 kb that contains DNAse hypersensitive sites active at a restricted time window during retinal development. Publicly available data confirmed expression of IRX1 and ADAMTS16 in human fetal retina, with IRX1 preferentially expressed in fetal macula. These findings represent a major advance in our understanding of the molecular genetics of NCMD and provide insights into the genetic pathways involved in human macular development

    Population genetic analysis of bi-allelic structural variants from low-coverage sequence data with an expectation-maximization algorithm

    Get PDF
    Background Population genetics and association studies usually rely on a set of known variable sites that are then genotyped in subsequent samples, because it is easier to genotype than to discover the variation. This is also true for structural variation detected from sequence data. However, the genotypes at known variable sites can only be inferred with uncertainty from low coverage data. Thus, statistical approaches that infer genotype likelihoods, test hypotheses, and estimate population parameters without requiring accurate genotypes are becoming popular. Unfortunately, the current implementations of these methods are intended to analyse only single nucleotide and short indel variation, and they usually assume that the two alleles in a heterozygous individual are sampled with equal probability. This is generally false for structural variants detected with paired ends or split reads. Therefore, the population genetics of structural variants cannot be studied, unless a painstaking and potentially biased genotyping is performed first. Results We present svgem, an expectation-maximization implementation to estimate allele and genotype frequencies, calculate genotype posterior probabilities, and test for Hardy-Weinberg equilibrium and for population differences, from the numbers of times the alleles are observed in each individual. Although applicable to single nucleotide variation, it aims at bi-allelic structural variation of any type, observed by either split reads or paired ends, with arbitrarily high allele sampling bias. We test svgem with simulated and real data from the 1000 Genomes Project. Conclusions svgem makes it possible to use low-coverage sequencing data to study the population distribution of structural variants without having to know their genotypes. Furthermore, this advance allows the combined analysis of structural and nucleotide variation within the same genotype-free statistical framework, thus preventing biases introduced by genotype imputation

    Processing and analyzing multiple genomes alignments with MafFilter

    Get PDF
    As the number of available genome sequences from both closely related species and individuals withinspecies increased, theoretical and methodological convergences between the fields of phylogenomics andpopulation genomics emerged. Population genomics typically focuses on the analysis of variants, whilephylogenomics heavily relies on genome alignments. However, these are playing an increasingly importantrole in studies at the population level. Multiple genome alignments of individuals are used when structuralvariation is of primary interest and when genome architecture permits to assemblede novogenomesequences. Here I describe MafFilter, a command-line-driven program allowing to process genome align-ments in the Multiple Alignment Format (MAF). Using concrete examples based on publicly availabledatasets, I demonstrate how MafFilter can be used to develop efficient and reproducible pipelines withquality assurance for downstream analyses. I further show how MafFilter can be used to perform both basicand advanced population genomic analyses in order to infer the patterns of nucleotide diversity alonggenomes

    Mutations causing medullary cystic kidney disease type 1 lie in a large VNTR in MUC1 missed by massively parallel sequencing

    Get PDF
    Although genetic lesions responsible for some mendelian disorders can be rapidly discovered through massively parallel sequencing of whole genomes or exomes, not all diseases readily yield to such efforts. We describe the illustrative case of the simple mendelian disorder medullary cystic kidney disease type 1 (MCKD1), mapped more than a decade ago to a 2-Mb region on chromosome 1. Ultimately, only by cloning, capillary sequencing and de novo assembly did we find that each of six families with MCKD1 harbors an equivalent but apparently independently arising mutation in sequence markedly under-represented in massively parallel sequencing data: the insertion of a single cytosine in one copy (but a different copy in each family) of the repeat unit comprising the extremely long (~1.5–5 kb), GC-rich (>80%) coding variable-number tandem repeat (VNTR) sequence in the MUC1 gene encoding mucin 1. These results provide a cautionary tale about the challenges in identifying the genes responsible for mendelian, let alone more complex, disorders through massively parallel sequencing.National Institutes of Health (U.S.) (Intramural Research Program)National Human Genome Research Institute (U.S.)Charles University (program UNCE 204011)Charles University (program PRVOUK-P24/LF1/3)Czech Republic. Ministry of Education, Youth, and Sports (grant NT13116-4/2012)Czech Republic. Ministry of Health (grant NT13116-4/2012)Czech Republic. Ministry of Health (grant LH12015)National Institutes of Health (U.S.) (Harvard Digestive Diseases Center, grant DK34854

    Using population admixture to help complete maps of the human genome

    Get PDF
    Tens of millions of base pairs of euchromatic human genome sequence, including many protein-coding genes, have no known location in the human genome. We describe an approach for localizing the human genome's missing pieces by utilizing the patterns of genome sequence variation created by population admixture. We mapped the locations of 70 scaffolds spanning four million base pairs of the human genome's unplaced euchromatic sequence, including more than a dozen protein-coding genes, and identified eight large novel inter-chromosomal segmental duplications. We find that most of these sequences are hidden in the genome's heterochromatin, particularly its pericentromeric regions. Many cryptic, pericentromeric genes are expressed in RNA and have been maintained intact for millions of years while their expression patterns diverged from those of paralogous genes elsewhere in the genome. We describe how knowledge of the locations of these sequences can inform disease association and genome biology studies

    A Comprehensive Map of Mobile Element Insertion Polymorphisms in Humans

    Get PDF
    As a consequence of the accumulation of insertion events over evolutionary time, mobile elements now comprise nearly half of the human genome. The Alu, L1, and SVA mobile element families are still duplicating, generating variation between individual genomes. Mobile element insertions (MEI) have been identified as causes for genetic diseases, including hemophilia, neurofibromatosis, and various cancers. Here we present a comprehensive map of 7,380 MEI polymorphisms from the 1000 Genomes Project whole-genome sequencing data of 185 samples in three major populations detected with two detection methods. This catalog enables us to systematically study mutation rates, population segregation, genomic distribution, and functional properties of MEI polymorphisms and to compare MEI to SNP variation from the same individuals. Population allele frequencies of MEI and SNPs are described, broadly, by the same neutral ancestral processes despite vastly different mutation mechanisms and rates, except in coding regions where MEI are virtually absent, presumably due to strong negative selection. A direct comparison of MEI and SNP diversity levels suggests a differential mobile element insertion rate among populations

    Analysis of copy number variation at DMBT1 and age-related macular degeneration

    Get PDF
    BACKGROUND: DMBT1 is a gene that shows extensive copy number variation (CNV) that alters the number of bacteria-binding domains in the protein and has been shown to activate the complement pathway. It lies next to the ARMS2/HTRA1 genes in a region of chromosome 10q26, where single nucleotide variants have been strongly associated with age-related macular degeneration (AMD), the commonest cause of blindness in Western populations. Complement activation is thought to be a key factor in the pathogenesis of this condition. We sought to investigate whether DMBT1 CNV plays any role in the susceptibility to AMD. METHODS: We analysed long-range linkage disequilibrium of DMBT1 CNV1 and CNV2 with flanking single nucleotide polymorphisms (SNPs) using our previously published CNV and HapMap Phase 3 SNP data in the CEPH Europeans from Utah (CEU). We then typed a large cohort of 860 AMD patients and 419 examined age-matched controls for copy number at DMBT1 CNV1 and CNV2 and combined these data with copy numbers from a further 480 unexamined controls. RESULTS: We found weak linkage disequilibrium between DMBT1 CNV1 and CNV2 with the SNPs rs1474526 and rs714816 in the HTRA1/ARMS2 region. By directly analysing copy number variation, we found no evidence of association of CNV1 or CNV2 with AMD. CONCLUSIONS: We have shown that copy number variation at DMBT1 does not affect risk of developing age-related macular degeneration and can therefore be ruled out from future studies investigating the association of structural variation at 10q26 with AMD

    High mutation rates explain low population genetic divergence at copy-number-variable loci in Homo sapiens

    Get PDF
    Copy-number-variable (CNV) loci differ from single nucleotide polymorphic (SNP) sites in size, mutation rate, and mechanisms of maintenance in natural populations. It is therefore hypothesized that population genetic divergence at CNV loci will differ from that found at SNP sites. Here, we test this hypothesis by analysing 856 CNV loci from the genomes of 1184 healthy individuals from 11 HapMap populations with a wide range of ancestry. The results show that population genetic divergence at the CNV loci is generally more than three times lower than at genome-wide SNP sites. Populations generally exhibit very small genetic divergence (G(st) = 0.05 ± 0.049). The smallest divergence is among African populations (G(st) = 0.0081 ± 0.0025), with increased divergence among non-African populations (G(st) = 0.0217 ± 0.0109) and then among African and non-African populations (G(st) = 0.0324 ± 0.0064). Genetic diversity is high in African populations (~0.13), low in Asian populations (~0.11), and intermediate in the remaining 11 populations. Few significant linkage disequilibria (LDs) occur between the genome-wide CNV loci. Patterns of gametic and zygotic LDs indicate the absence of epistasis among CNV loci. Mutation rate is about twice as large as the migration rate in the non-African populations, suggesting that the high mutation rates play dominant roles in producing the low population genetic divergence at CNV loci
    corecore