Duplication is a prominent mechanism of recent gene birth in Caenorhabditis elegans

Abstract

The high number of available reference genomes for different species and their comparison has enabled the elucidation of gene birth mechanisms that act over a long evolutionary timescale. However, the lack of several reference-quality genomes for different individuals of the same species has hampered the study of the mechanisms of more evolutionarily young gene births. Despite the high throughput brought about by second-generation sequencing technologies, their short read length has limited the study of genetic diversity to single nucleotide polymorphisms (SNPs) and short indels. However, in order to study gene-level events, we need to characterise the genetic diversity of a species comprehensively, including structural variants (SVs) (> 50 bp). I present the most comprehensive set of genomes and SVs for Caenorhabditis elegans. I have assembled a high-quality genome for each of 20 wild isolates of the nematode using long and short read sequencing. I show that 1,587 transcripts are deleted among the wild isolates and thus sketch the  first definition of the core genome of C. elegans. I present the case of a highly proliferative transposon harbouring a transcription factor binding site (TFBS) and use it to address the question of transposon co-option in this model organism. Finally, using this dataset, I show that tandem gene duplication is a prominent gene birth mechanism, whereas horizontal gene transfer (HGT) played little or no role in the birth of recent C. elegans genes. Additionally, I show that G protein-coupled receptors (GPCRs) have high levels of presence/absence variation (PAV) and discuss the significance of this  finding in light of the ecology of this little worm.Wellcom

    Similar works

    Full text

    thumbnail-image