4 research outputs found
Pushing the limits of HiFi assemblies reveals centromere diversity between two Arabidopsis thaliana genomes.
Funder: DFGFunder: Max Planck SocietyAlthough long-read sequencing can often enable chromosome-level reconstruction of genomes, it is still unclear how one can routinely obtain gapless assemblies. In the model plant Arabidopsis thaliana, other than the reference accession Col-0, all other accessions de novo assembled with long-reads until now have used PacBio continuous long reads (CLR). Although these assemblies sometimes achieved chromosome-arm level contigs, they inevitably broke near the centromeres, excluding megabases of DNA from analysis in pan-genome projects. Since PacBio high-fidelity (HiFi) reads circumvent the high error rate of CLR technologies, albeit at the expense of read length, we compared a CLR assembly of accession Eyach15-2 to HiFi assemblies of the same sample. The use of five different assemblers starting from subsampled data allowed us to evaluate the impact of coverage and read length. We found that centromeres and rDNA clusters are responsible for 71% of contig breaks in the CLR scaffolds, while relatively short stretches of GA/TC repeats are at the core of >85% of the unfilled gaps in our best HiFi assemblies. Since the HiFi technology consistently enabled us to reconstruct gapless centromeres and 5S rDNA clusters, we demonstrate the value of the approach by comparing these previously inaccessible regions of the genome between the Eyach15-2 accession and the reference accession Col-0
Recommended from our members
Cycles of satellite and transposon evolution in Arabidopsis centromeres.
Centromeres are critical for cell division, loading CENH3 or CENPA histone variant nucleosomes, directing kinetochore formation and allowing chromosome segregation1,2. Despite their conserved function, centromere size and structure are diverse across species. To understand this centromere paradox3,4, it is necessary to know how centromeric diversity is generated and whether it reflects ancient trans-species variation or, instead, rapid post-speciation divergence. To address these questions, we assembled 346 centromeres from 66 Arabidopsis thaliana and 2 Arabidopsis lyrata accessions, which exhibited a remarkable degree of intra- and inter-species diversity. A. thaliana centromere repeat arrays are embedded in linkage blocks, despite ongoing internal satellite turnover, consistent with roles for unidirectional gene conversion or unequal crossover between sister chromatids in sequence diversification. Additionally, centrophilic ATHILA transposons have recently invaded the satellite arrays. To counter ATHILA invasion, chromosome-specific bursts of satellite homogenization generate higher-order repeats and purge transposons, in line with cycles of repeat evolution. Centromeric sequence changes are even more extreme in comparison between A. thaliana and A. lyrata. Together, our findings identify rapid cycles of transposon invasion and purging through satellite homogenization, which drive centromere evolution and ultimately contribute to speciation.This work was supported by BBSRC grants BB/S006842/1, BB/S020012/1 and BB/V003984/1, European Research Council Consolidator Award ERC-2015-CoG-681987, Marie Curie International Training Network ‘MEICOM’ and Human Frontier Science Program award RGP0025/2021 to IRH; EMBO long term postdoctoral fellowship ALTF224-2022 to RB; Human Frontiers Science Program (HFSP) Long-Term Fellowship (LT000819/2018-L) to FAR; Max Planck Society to DW; ERA-CAPS 1001G+ grant to MNo and DW; Royal Society awards UF160222, URF\R\221024, RGF/R1/180006 and RGF/EA/201030 to AB; European Research Council Award ERC HOW2DOBLE 101041354 to PYN; Czech Science Foundation grant number 21-03909S to ML; a BBSRC DTP Studentship to NG; a Broodbank Fellowship to MN; and grant PID2022-136893NB-I00 from the Ministerio de Ciencia e Innovación of Spain/Agencia Estatal de Investigación/10.13039/501100011033/FEDER, EU, to CAB
Monitoring rapid evolution of plant populations at scale with pool-sequencing
The change in allele frequencies within a population over time represents a fundamental process of evolution. By monitoring allele frequencies, we can analyze the effects of natural selection and genetic drift on populations. To efficiently track time-resolved genetic change, large experimental or wild populations can be sequenced as pools of individuals sampled over time using high-throughput genome sequencing (called the Evolve & Resequence approach, E&R). Here, we present a set of experiments using hundreds of natural genotypes of the model plant Arabidopsis thaliana to showcase the power of this approach to study rapid evolution at large scale. First, we validate that sequencing DNA directly extracted from pools of flowers from multiple plants -- organs that are relatively consistent in size and easy to sample -- produces comparable results to other, more expensive state-of-the-art approaches such as sampling and sequencing of individual leaves. Sequencing pools of flowers from 25-50 individuals at ∼40X coverage recovers genome-wide frequencies in diverse populations with accuracy r > 0.95. Secondly, to enable analyses of evolutionary adaptation using E&R approaches of plants in highly replicated environments, we provide open source tools that streamline sequencing data curation and calculate various population genetic statistics two orders of magnitude faster than current software. To directly demonstrate the usefulness of our method, we conducted a two-year outdoor evolution experiment with A. thaliana to show signals of rapid evolution in multiple genomic regions. We demonstrate how these laboratory and computational Pool-seq-based methods can be scaled to study hundreds of populations across many climates