68 research outputs found

    Landscape of multi-nucleotide variants in 125,748 human exomes and 15,708 genomes.

    Get PDF
    Multi-nucleotide variants (MNVs), defined as two or more nearby variants existing on the same haplotype in an individual, are a clinically and biologically important class of genetic variation. However, existing tools typically do not accurately classify MNVs, and understanding of their mutational origins remains limited. Here, we systematically survey MNVs in 125,748 whole exomes and 15,708 whole genomes from the Genome Aggregation Database (gnomAD). We identify 1,792,248 MNVs across the genome with constituent variants falling within 2 bp distance of one another, including 18,756 variants with a novel combined effect on protein sequence. Finally, we estimate the relative impact of known mutational mechanisms - CpG deamination, replication error by polymerase zeta, and polymerase slippage at repeat junctions - on the generation of MNVs. Our results demonstrate the value of haplotype-aware variant annotation, and refine our understanding of genome-wide mutational mechanisms of MNVs

    Mapping and phasing of structural variation in patient genomes using nanopore sequencing

    Get PDF
    Despite improvements in genomics technology, the detection of structural variants (SVs) from short-read sequencing still poses challenges, particularly for complex variation. Here we analyse the genomes of two patients with congenital abnormalities using the MinION nanopore sequencer and a novel computational pipeline—NanoSV. We demonstrate that nanopore long reads are superior to short reads with regard to detection of de novo chromothripsis rearrangements. The long reads also enable efficient phasing of genetic variations, which we leveraged to determine the parental origin of all de novo chromothripsis breakpoints and to resolve the structure of these complex rearrangements. Additionally, genome-wide surveillance of inherited SVs reveals novel variants, missed in short-read data sets, a large proportion of which are retrotransposon insertions. We provide a first exploration of patient genome sequencing with a nanopore sequencer and demonstrate the value of long-read sequencing in mapping and phasing of SVs for both clinical and research applications

    Large scale variation in the rate of germ-line de novo mutation, base composition, divergence and diversity in humans

    Get PDF
    It has long been suspected that the rate of mutation varies across the human genome at a large scale based on the divergence between humans and other species. However, it is now possible to directly investigate this question using the large number of de novo mutations (DNMs) that have been discovered in humans through the sequencing of trios. We investi- gate a number of questions pertaining to the distribution of mutations using more than 130,000 DNMs from three large datasets. We demonstrate that the amount and pattern of variation differs between datasets at the 1MB and 100KB scales probably as a consequence of differences in sequencing technology and processing. In particular, datasets show differ- ent patterns of correlation to genomic variables such as replication time. Never-the-less there are many commonalities between datasets, which likely represent true patterns. We show that there is variation in the mutation rate at the 100KB, 1MB and 10MB scale that can- not be explained by variation at smaller scales, however the level of this variation is modest at large scales–at the 1MB scale we infer that ~90% of regions have a mutation rate within 50% of the mean. Different types of mutation show similar levels of variation and appear to vary in concert which suggests the pattern of mutation is relatively constant across the genome. We demonstrate that variation in the mutation rate does not generate large-scale variation in GC-content, and hence that mutation bias does not maintain the isochore struc- ture of the human genome. We find that genomic features explain less than 40% of the explainable variance in the rate of DNM. As expected the rate of divergence between spe- cies is correlated to the rate of DNM. However, the correlations are weaker than expected if all the variation in divergence was due to variation in the mutation rate. We provide evidence that this is due the effect of biased gene conversion on the probability that a mutation will become fixed. In contrast to divergence, we find that most of the variation in diversity can be explained by variation in the mutation rate. Finally, we show that the correlation between divergence and DNM density declines as increasingly divergent species are considered

    De novo single-nucleotide and copy number variation in discordant monozygotic twins reveals disease-related genes

    Get PDF
    Recent studies have demonstrated genetic differences between monozygotic (MZ) twins. To test the hypothesis that early post-twinning mutational events associate with phenotypic discordance, we investigated a cohort of 13 twin pairs (n = 26) discordant for various clinical phenotypes using whole-exome sequencing and screened for copy number variation (CNV). We identified a de novo variant in PLCB1, a gene involved in the hydrolysis of lipid phosphorus in milk from dairy cows, associated with lactase non-persistence, and a variant in the mitochondrial complex I gene MT-ND5 associated with amyotrophic lateral sclerosis (ALS). We also found somatic variants in multiple genes (TMEM225B, KBTBD3, TUBGCP4, TFIP11) in another MZ twin pair discordant for ALS. Based on the assumption that discordance between twins could be explained by a common variant with variable penetrance or expressivity, we screened the twin samples for known pathogenic variants that are shared and identified a rare deletion overlapping ARHGAP11B, in the twin pair manifesting with either schizotypal personality disorder or schizophrenia. Parent-offspring trio analysis was implemented for two twin pairs to assess potential association of variants of parental origin with susceptibility to disease. We identified a de novo variant in RASD2 shared by 8-year-old male twins with a suspected diagnosis of autism spectrum disorder (ASD) manifesting as different traits. A de novo CNV duplication was also identified in these twins overlapping CD38, a gene previously implicated in ASD. In twins discordant for Tourette's syndrome, a paternally inherited stop loss variant was detected in AADAC, a known candidate gene for the disorder

    Sequencing and de novo assembly of 150 genomes from Denmark as a population reference

    Get PDF
    Hundreds of thousands of human genomes are now being sequenced to characterize genetic variation and use this information to augment association mapping studies of complex disorders and other phenotypic traits. Genetic variation is identified mainly by mapping short reads to the reference genome or by performing local assembly. However, these approaches are biased against discovery of structural variants and variation in the more complex parts of the genome. Hence, large-scale de novo assembly is needed. Here we show that it is possible to construct excellent de novo assemblies from high-coverage sequencing with mate-pair libraries extending up to 20 kilobases. We report de novo assemblies of 150 individuals (50 trios) from the GenomeDenmark project. The quality of these assemblies is similar to those obtained using the more expensive long-read technology. We use the assemblies to identify a rich set of structural variants including many novel insertions and demonstrate how this variant catalogue enables further deciphering of known association mapping signals. We leverage the assemblies to provide 100 completely resolved major histocompatibility complex haplotypes and to resolve major parts of the Y chromosome. Our study provides a regional reference genome that we expect will improve the power of future association mapping studies and hence pave the way for precision medicine initiatives, which now are being launched in many countries including Denmark

    A framework for the detection of de novo mutations in family-based sequencing data

    No full text
    Germline mutation detection from human DNA sequence data is challenging due to the rarity of such events relative to the intrinsic error rates of sequencing technologies and the uneven coverage across the genome. We developed PhaseByTransmission (PBT) to identify de novo single nucleotide variants and short insertions and deletions (indels) from sequence data collected in parent-offspring trios. We compute the joint probability of the data given the genotype likelihoods in the individual family members, the known familial relationships and a prior probability for the mutation rate. Candidate de novo mutations (DNMs) are reported along with their posterior probability, providing a systematic way to prioritize them for validation. Our tool is integrated in the Genome Analysis Toolkit and can be used together with the ReadBackedPhasing module to infer the parental origin of DNMs based on phase-informative reads. Using simulated data, we show that PBT outperforms existing tools, especially in low coverage data and on the X chromosome. We further show that PBT displays high validation rates on empirical parent-offspring sequencing data for whole-exome data from 104 trios and X-chromosome data from 249 parent-offspring families. Finally, we demonstrate an association between father’s age at conception and the number of DNMs in female offspring’s X chromosome, consistent with previous literature reports
    • …
    corecore