The high number of available reference genomes for different species and their comparison has enabled the elucidation of gene birth mechanisms that act over a long evolutionary timescale. However, the lack of several reference-quality genomes for different individuals of the same species has hampered the study of the mechanisms of more evolutionarily young gene births. Despite the high throughput brought about by second-generation sequencing technologies, their short read length has limited
the study of genetic diversity to single nucleotide polymorphisms (SNPs) and short indels. However, in order to study gene-level events, we need to characterise the genetic diversity of a species comprehensively, including structural variants (SVs) (> 50 bp).
I present the most comprehensive set of genomes and SVs for Caenorhabditis elegans. I have assembled a high-quality genome for each of 20 wild isolates of the nematode using long and short read sequencing. I show that 1,587 transcripts are deleted among the wild isolates and thus sketch the first definition of the core genome of C. elegans. I present the case of a highly proliferative transposon harbouring a transcription factor binding site (TFBS) and use it to address the question of
transposon co-option in this model organism. Finally, using this dataset, I show that tandem gene duplication is a prominent gene birth mechanism, whereas horizontal gene transfer (HGT) played little or no role in the birth of recent C. elegans genes.
Additionally, I show that G protein-coupled receptors (GPCRs) have high levels of presence/absence variation (PAV) and discuss the significance of this finding in light of the ecology of this little worm.Wellcom