3 research outputs found

    Pushing the limits of HiFi assemblies reveals centromere diversity between two Arabidopsis thaliana genomes.

    No full text
    Funder: DFGFunder: Max Planck SocietyAlthough long-read sequencing can often enable chromosome-level reconstruction of genomes, it is still unclear how one can routinely obtain gapless assemblies. In the model plant Arabidopsis thaliana, other than the reference accession Col-0, all other accessions de novo assembled with long-reads until now have used PacBio continuous long reads (CLR). Although these assemblies sometimes achieved chromosome-arm level contigs, they inevitably broke near the centromeres, excluding megabases of DNA from analysis in pan-genome projects. Since PacBio high-fidelity (HiFi) reads circumvent the high error rate of CLR technologies, albeit at the expense of read length, we compared a CLR assembly of accession Eyach15-2 to HiFi assemblies of the same sample. The use of five different assemblers starting from subsampled data allowed us to evaluate the impact of coverage and read length. We found that centromeres and rDNA clusters are responsible for 71% of contig breaks in the CLR scaffolds, while relatively short stretches of GA/TC repeats are at the core of >85% of the unfilled gaps in our best HiFi assemblies. Since the HiFi technology consistently enabled us to reconstruct gapless centromeres and 5S rDNA clusters, we demonstrate the value of the approach by comparing these previously inaccessible regions of the genome between the Eyach15-2 accession and the reference accession Col-0

    Monitoring rapid evolution of plant populations at scale with pool-sequencing

    No full text
    The change in allele frequencies within a population over time represents a fundamental process of evolution. By monitoring allele frequencies, we can analyze the effects of natural selection and genetic drift on populations. To efficiently track time-resolved genetic change, large experimental or wild populations can be sequenced as pools of individuals sampled over time using high-throughput genome sequencing (called the Evolve & Resequence approach, E&R). Here, we present a set of experiments using hundreds of natural genotypes of the model plant Arabidopsis thaliana to showcase the power of this approach to study rapid evolution at large scale. First, we validate that sequencing DNA directly extracted from pools of flowers from multiple plants -- organs that are relatively consistent in size and easy to sample -- produces comparable results to other, more expensive state-of-the-art approaches such as sampling and sequencing of individual leaves. Sequencing pools of flowers from 25-50 individuals at ∼40X coverage recovers genome-wide frequencies in diverse populations with accuracy r > 0.95. Secondly, to enable analyses of evolutionary adaptation using E&R approaches of plants in highly replicated environments, we provide open source tools that streamline sequencing data curation and calculate various population genetic statistics two orders of magnitude faster than current software. To directly demonstrate the usefulness of our method, we conducted a two-year outdoor evolution experiment with A. thaliana to show signals of rapid evolution in multiple genomic regions. We demonstrate how these laboratory and computational Pool-seq-based methods can be scaled to study hundreds of populations across many climates
    corecore