18 research outputs found

    Sequencing and de novo assembly of 150 genomes from Denmark as a population reference

    Get PDF
    Hundreds of thousands of human genomes are now being sequenced to characterize genetic variation and use this information to augment association mapping studies of complex disorders and other phenotypic traits. Genetic variation is identified mainly by mapping short reads to the reference genome or by performing local assembly. However, these approaches are biased against discovery of structural variants and variation in the more complex parts of the genome. Hence, large-scale de novo assembly is needed. Here we show that it is possible to construct excellent de novo assemblies from high-coverage sequencing with mate-pair libraries extending up to 20 kilobases. We report de novo assemblies of 150 individuals (50 trios) from the GenomeDenmark project. The quality of these assemblies is similar to those obtained using the more expensive long-read technology. We use the assemblies to identify a rich set of structural variants including many novel insertions and demonstrate how this variant catalogue enables further deciphering of known association mapping signals. We leverage the assemblies to provide 100 completely resolved major histocompatibility complex haplotypes and to resolve major parts of the Y chromosome. Our study provides a regional reference genome that we expect will improve the power of future association mapping studies and hence pave the way for precision medicine initiatives, which now are being launched in many countries including Denmark

    The time scale of recombination rate evolution in great apes

    Get PDF
    We present three linkage-disequilibrium (LD)-based recombination maps generated using whole-genome sequence data from 10 Nigerian chimpanzees, 13 bonobos, and 15 western gorillas, collected as part of the Great Ape Genome Project (Prado-Martinez J, et al. 2013. Great ape genetic diversity and population history. Nature 499:471-475). We also identified species-specific recombination hotspots in each group using a modified LDhot framework, which greatly improves statistical power to detect hotspots at varying strengths. We show that fewer hotspots are shared among chimpanzee subspecies than within human populations, further narrowing the time scale of complete hotspot turnover. Further, using species-specific PRDM9 sequences to predict potential binding sites (PBS), we show higher predicted PRDM9 binding in recombination hotspots as compared to matched cold spot regions in multiple great ape species, including at least one chimpanzee subspecies. We found that correlations between broad-scale recombination rates decline more rapidly than nucleotide divergence between species. We also compared the skew of recombination rates at centromeres and telomeres between species and show a skew from chromosome means extending as far as 10-15Mb from chromosome ends. Further, we examined broad-scale recombination rate changes near a translocation in gorillas and found minimal differences as compared to other great ape species perhaps because the coordinates relative to the chromosome ends were unaffected. Finally, on the basis of multiple linear regression analysis, we found that various correlates of recombination rate persist throughout the African great apes including repeats, diversity, and divergence. Our study is the first to analyze within- And between-species genome-wide recombination rate variation in several close relatives

    Split times estimates for the three great ape genera.

    No full text
    <p>The box plot shows the estimated split times using either the isolation model or the isolation-with-migration model for the three great ape comparisons. The box plots on the left shows the split time estimate in the isolation model while the box plots on the right shows both the initial population divergence and the end of gene flow. The variation in estimates is from each 10 Mbp segment of the genome.</p

    The effect of using a random genotype phase.

    No full text
    <p>We simulated the situation where the genotype phase is unknown by simulating two genomes and selecting a random allele for all heterozygotic sites. The plot shows the effect on parameter estimates of not knowing the phase.</p

    Parameter estimates for the human/chimpanzee split with the isolation-with-migration model.

    No full text
    <p>The histograms show the distribution of parameter estimates for the human/bonobo speciation (blue) and the human/chimpanzee speciation (red) using the isolation-with-migration model.</p

    The effect of mutation rate variation.

    No full text
    <p>The figure shows the effect on parameter estimation when the mutation rate is varied along the genome alignment. We split the alignment into segments geometrically distributed with mean length 500 bp and 2 kbp, and the mutation rate is then scaled by a random value chosen uniformly in the range 0.75 to 1.25 or 0.5 to 1.5. The dashed lines show the simulated values. The largest effect on varying the mutation rate is seen in the top-most parameters, the coalescence rate and the mutation rate. Varying the mutation rate increases the variance in coalescence times scaled with mutation rate, which is interpreted by the model as a decreased coalescence rate, while segments with low mutation rates are seen as more recent coalescence rates which the model interprets as evidence for migration. Consequently, variation in mutation rate decreases our estimates of the coalescence rate and increases our estimates of migration rates.</p

    Chromosome wise split time estimates.

    No full text
    <p>The box plots show the estimates of the initial split time and the end of gene flow in the isolation-with-migration model for each 10 Mbp segment for each chromosome.</p

    Model comparison between the isolation and the isolation-with-migration model.

    No full text
    <p>The box plots show the Akaike Information Criteria (AIC) for the isolation model against the isolation-with-migration model. For each 10 Mbp genomic segment we have plotted the AIC for the model including migration minus the model without. The model with the smallest AIC should be preferred, so values below zero prefers the isolation model while values above zero prefers the migration model.</p
    corecore