We have used whole genome paired-end Illumina sequence data to identify
tandem duplications in 20 isofemale lines of D. yakuba, and 20 isofemale lines
of D. simulans and performed genome wide validation with PacBio long molecule
sequencing. We identify 1,415 tandem duplications that are segregating in D.
yakuba as well as 975 duplications in D. simulans, indicating greater variation
in D. yakuba. Additionally, we observe high rates of secondary deletions at
duplicated sites, with 8% of duplicated sites in D. simulans and 17% of sites
in D. yakuba modified with deletions. These secondary deletions are consistent
with the action of the large loop mismatch repair system acting to remove
polymorphic tandem duplication, resulting in rapid dynamics of gain and loss in
duplicated alleles and a richer substrate of genetic novelty than has been
previously reported. Most duplications are present in only single strains,
suggesting deleterious impacts are common. D. simulans shows larger numbers of
whole gene duplications in comparison to larger proportions of gene fragments
in D. yakuba. D. simulans displays an excess of high frequency variants on the
X chromosome, consistent with adaptive evolution through duplications on the D.
simulans X or demographic forces driving duplicates to high frequency. We
identify 78 chimeric genes in D. yakuba and 38 chimeric genes in D. simulans,
as well as 143 cases of recruited non-coding sequence in D. yakuba and 96 in D.
simulans, in agreement with rates of chimeric gene origination in D.
melanogaster. Together, these results suggest that tandem duplications often
result in complex variation beyond whole gene duplications that offers a rich
substrate of standing variation that is likely to contribute both to
detrimental phenotypes and disease, as well as to adaptive evolutionary change.Comment: Revised Version- Accepted at Molecular Biology and Evolutio