35 research outputs found
Independent, Rapid and Targeted Loss of Highly Repetitive DNA in Natural and Synthetic Allopolyploids of Nicotiana tabacum
Allopolyploidy (interspecific hybridisation and polyploidy) has played a significant role in the evolutionary history of angiosperms and can result in genomic, epigenetic and transcriptomic perturbations. We examine the immediate effects of allopolyploidy on repetitive DNA by comparing the genomes of synthetic and natural Nicotiana tabacum with diploid progenitors N. tomentosiformis (paternal progenitor) and N. sylvestris (maternal progenitor). Using next generation sequencing, a recently developed graph-based repeat identification pipeline, Southern blot and fluorescence in situ hybridisation (FISH) we characterise two highly repetitive DNA sequences (NicCL3 and NicCL7/30). Analysis of two independent high-throughput DNA sequencing datasets indicates NicCL3 forms 1.6–1.9% of the genome in N. tomentosiformis, sequences that occur in multiple, discontinuous tandem arrays scattered over several chromosomes. Abundance estimates, based on sequencing depth, indicate NicCL3 is almost absent in N. sylvestris and has been dramatically reduced in copy number in the allopolyploid N. tabacum. Surprisingly elimination of NicCL3 is repeated in some synthetic lines of N. tabacum in their forth generation. The retroelement NicCL7/30, which occurs interspersed with NicCL3, is also under-represented but to a much lesser degree, revealing targeted elimination of the latter. Analysis of paired-end sequencing data indicates the tandem component of NicCL3 has been preferentially removed in natural N. tabacum, increasing the proportion of the dispersed component. This occurs across multiple blocks of discontinuous repeats and based on the distribution of nucleotide similarity among NicCL3 units, was concurrent with rounds of sequence homogenisation
Genomic Diversity in Two Related Plant Species with and without Sex Chromosomes - Silene latifolia and S. vulgaris
Genome size evolution is a complex process influenced by polyploidization, satellite DNA accumulation, and expansion of retroelements. How this process could be affected by different reproductive strategies is still poorly understood.We analyzed differences in the number and distribution of major repetitive DNA elements in two closely related species, Silene latifolia and S. vulgaris. Both species are diploid and possess the same chromosome number (2n = 24), but differ in their genome size and mode of reproduction. The dioecious S. latifolia (1C = 2.70 pg DNA) possesses sex chromosomes and its genome is 2.5× larger than that of the gynodioecious S. vulgaris (1C = 1.13 pg DNA), which does not possess sex chromosomes. We discovered that the genome of S. latifolia is larger mainly due to the expansion of Ogre retrotransposons. Surprisingly, the centromeric STAR-C and TR1 tandem repeats were found to be more abundant in S. vulgaris, the species with the smaller genome. We further examined the distribution of major repetitive sequences in related species in the Caryophyllaceae family. The results of FISH (fluorescence in situ hybridization) on mitotic chromosomes with the Retand element indicate that large rearrangements occurred during the evolution of the Caryophyllaceae family.Our data demonstrate that the evolution of genome size in the genus Silene is accompanied by the expansion of different repetitive elements with specific patterns in the dioecious species possessing the sex chromosomes
Linked read technology for assembling large complex and polyploid genomes
Background: Short read DNA sequencing technologies have revolutionized genome assembly by providing high accuracy and throughput data at low cost. But it remains challenging to assemble short read data, particularly for large, complex and polyploid genomes. The linked read strategy has the potential to enhance the value of short reads for genome assembly because all reads originating from a single long molecule of DNA share a common barcode. However, the majority of studies to date that have employed linked reads were focused on human haplotype phasing and genome assembly.
Results: Here we describe a de novo maize B73 genome assembly generated via linked read technology which contains ~ 172,000 scaffolds with an N50 of 89 kb that cover 50% of the genome. Based on comparisons to the B73 reference genome, 91% of linked read contigs are accurately assembled. Because it was possible to identify errors with \u3e 76% accuracy using machine learning, it may be possible to identify and potentially correct systematic errors. Complex polyploids represent one of the last grand challenges in genome assembly. Linked read technology was able to successfully resolve the two subgenomes of the recent allopolyploid, proso millet (Panicum miliaceum). Our assembly covers ~ 83% of the 1 Gb genome and consists of 30,819 scaffolds with an N50 of 912 kb.
Conclusions: Our analysis provides a framework for future de novo genome assemblies using linked reads, and we suggest computational strategies that if implemented have the potential to further improve linked read assemblies, particularly for repetitive genomes
In Depth Characterization of Repetitive DNA in 23 Plant Genomes Reveals Sources of Genome Size Variation in the Legume Tribe Fabeae
The differential accumulation and elimination of repetitive DNA are key drivers of genome size variation in flowering plants, yet there have been few studies which have analysed how different types of repeats in related species contribute to genome size evolution within a phylogenetic context. This question is addressed here by conducting large-scale comparative analysis of repeats in 23 species from four genera of the monophyletic legume tribe Fabeae, representing a 7.6-fold variation in genome size. Phylogenetic analysis and genome size reconstruction revealed that this diversity arose from genome size expansions and contractions in different lineages during the evolution of Fabeae. Employing a combination of low-pass genome sequencing with novel bioinformatic approaches resulted in identification and quantification of repeats making up 55-83% of the investigated genomes. In turn, this enabled an analysis of how each major repeat type contributed to the genome size variation encountered. Differential accumulation of repetitive DNA was found to account for 85% of the genome size differences between the species, and most (57%) of this variation was found to be driven by a single lineage of Ty3/gypsy LTR-retrotransposons, the Ogre elements. Although the amounts of several other lineages of LTR-retrotransposons and the total amount of satellite DNA were also positively correlated with genome size, their contributions to genome size variation were much smaller (up to 6%). Repeat analysis within a phylogenetic framework also revealed profound differences in the extent of sequence conservation between different repeat types across Fabeae. In addition to these findings, the study has provided a proof of concept for the approach combining recent developments in sequencing and bioinformatics to perform comparative analyses of repetitive DNAs in a large number of non-model species without the need to assemble their genomes
CenH3 evolution in diploids and polyploids of three angiosperm genera
BACKGROUND: Centromeric DNA sequences alone are neither necessary nor sufficient for centromere specification. The centromere specific histone, CenH3, evolves rapidly in many species, perhaps as a coevolutionary response to rapidly evolving centromeric DNA. To gain insight into CenH3 evolution, we characterized patterns of nucleotide and protein diversity among diploids and allopolyploids within three diverse angiosperm genera, Brassica, Oryza, and Gossypium (cotton), with a focus on evidence for diversifying selection in the various domains of the CenH3 gene. In addition, we compare expression profiles and alternative splicing patterns for CenH3 in representatives of each genus. RESULTS: All three genera retain both duplicated CenH3 copies, while Brassica and Gossypium exhibit pronounced homoeologous expression level bias. Comparisons among genera reveal shared and unique aspects of CenH3 evolution, variable levels of diversifying selection in different CenH3 domains, and that alternative splicing contributes significantly to CenH3 diversity. CONCLUSIONS: Since the N terminus is subject to diversifying selection but the DNA binding domains do not appear to be, rapidly evolving centromere sequences are unlikely to be the primary driver of CenH3 sequence diversification. At present, the functional explanation for the diversity generated by both conventional protein evolution in the N terminal domain, as well as alternative splicing, remains unexplained. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12870-014-0383-3) contains supplementary material, which is available to authorized users