7,024 research outputs found
Genomic abundance is not predictive of tandem repeat localization in grass genomes.
Highly repetitive regions have historically posed a challenge when investigating sequence variation and content. High-throughput sequencing has enabled researchers to use whole-genome shotgun sequencing to estimate the abundance of repetitive sequence, and these methodologies have been recently applied to centromeres. Previous research has investigated variation in centromere repeats across eukaryotes, positing that the highest abundance tandem repeat in a genome is often the centromeric repeat. To test this assumption, we used shotgun sequencing and a bioinformatic pipeline to identify common tandem repeats across a number of grass species. We find that de novo assembly and subsequent abundance ranking of repeats can successfully identify tandem repeats with homology to known tandem repeats. Fluorescent in-situ hybridization shows that de novo assembly and ranking of repeats from non-model taxa identifies chromosome domains rich in tandem repeats both near pericentromeres and elsewhere in the genome
Analysis of the giant genomes of Fritillaria (Liliaceae) indicates that a lack of DNA removal characterizes extreme expansions in genome size.
This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.Plants exhibit an extraordinary range of genome sizes, varying by > 2000-fold between the smallest and largest recorded values. In the absence of polyploidy, changes in the amount of repetitive DNA (transposable elements and tandem repeats) are primarily responsible for genome size differences between species. However, there is ongoing debate regarding the relative importance of amplification of repetitive DNA versus its deletion in governing genome size. Using data from 454 sequencing, we analysed the most repetitive fraction of some of the largest known genomes for diploid plant species, from members of Fritillaria. We revealed that genomic expansion has not resulted from the recent massive amplification of just a handful of repeat families, as shown in species with smaller genomes. Instead, the bulk of these immense genomes is composed of highly heterogeneous, relatively low-abundance repeat-derived DNA, supporting a scenario where amplified repeats continually accumulate due to infrequent DNA removal. Our results indicate that a lack of deletion and low turnover of repetitive DNA are major contributors to the evolution of extremely large genomes and show that their size cannot simply be accounted for by the activity of a small number of high-abundance repeat families.Thiswork was supported by the Natural Environment ResearchCouncil (grant no. NE/G017 24/1), the Czech Science Fou nda-tion (grant no. P501/12/G090), the AVCR (grant no.RVO:60077344) and a Beatriu de Pinos postdoctoral fellowshipto J.P. (grant no. 2011-A-00292; Catalan Government-E.U. 7thF.P.)
Low coverage sequencing for repetitive DNA analysis in Passiflora edulis Sims: Citogenomic characterization of transposable elements and satellite DNA
Background: The cytogenomic study of repetitive regions is fundamental for the understanding of morphofunctional mechanisms and genome evolution. Passiflora edulis a species of relevant agronomic value, this work had its genome sequenced by next generation sequencing and bioinformatics analysis performed by RepeatExplorer pipeline. The clusters allowed the identification and characterization of repetitive elements (predominant contributors to most plant genomes). The aim of this study was to identify, characterize and map the repetitive DNA of P. edulis, providing important cytogenomic markers, especially sequences associated with the centromere. Results: Three clusters of satellite DNAs (69, 118 and 207) and seven clusters of Long Terminal Repeat (LTR) retrotransposons of the superfamilies Ty1/Copy and Ty3/Gypsy and families Angela, Athila, Chromovirus and Maximus-Sire (6, 11, 36, 43, 86, 94 and 135) were characterized and analyzed. The chromosome mapping of satellite DNAs showed two hybridization sites co-located in the 5S rDNA region (PeSat_1), subterminal hybridizations (PeSat_3) and hybridization in four sites, co-located in the 45S rDNA region (PeSat_2). Most of the retroelements hybridizations showed signals scattered in the chromosomes, diverging in abundance, and only the cluster 6 presented pericentromeric regions marking. No satellite DNAs and retroelement associated with centromere was observed. Conclusion: P. edulis has a highly repetitive genome, with the predominance of Ty3/Gypsy LTR retrotransposon. The satellite DNAs and LTR retrotransposon characterized are promising markers for investigation of the evolutionary patterns and genetic distinction of species and hybrids of Passiflora
Next Generation Sequencing-Based Analysis of Repetitive DNA in the Model Dioceous Plant Silene latifolia
BACKGROUND: Silene latifolia is a dioecious [corrected] plant with well distinguished X and Y chromosomes that is used as a model to study sex determination and sex chromosome evolution in plants. However, efficient utilization of this species has been hampered by the lack of large-scale sequencing resources and detailed analysis of its genome composition, especially with respect to repetitive DNA, which makes up the majority of the genome. METHODOLOGY/PRINCIPAL FINDINGS: We performed low-pass 454 sequencing followed by similarity-based clustering of 454 reads in order to identify and characterize sequences of all major groups of S. latifolia repeats. Illumina sequencing data from male and female genomes were also generated and employed to quantify the genomic proportions of individual repeat families. The majority of identified repeats belonged to LTR-retrotransposons, constituting about 50% of genomic DNA, with Ty3/gypsy elements being more frequent than Ty1/copia. While there were differences between the male and female genome in the abundance of several repeat families, their overall repeat composition was highly similar. Specific localization patterns on sex chromosomes were found for several satellite repeats using in situ hybridization with probes based on k-mer frequency analysis of Illumina sequencing data. CONCLUSIONS/SIGNIFICANCE: This study provides comprehensive information about the sequence composition and abundance of repeats representing over 60% of the S. latifolia genome. The results revealed generally low divergence in repeat composition between the sex chromosomes, which is consistent with their relatively recent origin. In addition, the study generated various data resources that are available for future exploration of the S. latifolia genome
In Depth Characterization of Repetitive DNA in 23 Plant Genomes Reveals Sources of Genome Size Variation in the Legume Tribe Fabeae
The differential accumulation and elimination of repetitive DNA are key drivers of genome size variation in flowering plants, yet there have been few studies which have analysed how different types of repeats in related species contribute to genome size evolution within a phylogenetic context. This question is addressed here by conducting large-scale comparative analysis of repeats in 23 species from four genera of the monophyletic legume tribe Fabeae, representing a 7.6-fold variation in genome size. Phylogenetic analysis and genome size reconstruction revealed that this diversity arose from genome size expansions and contractions in different lineages during the evolution of Fabeae. Employing a combination of low-pass genome sequencing with novel bioinformatic approaches resulted in identification and quantification of repeats making up 55-83% of the investigated genomes. In turn, this enabled an analysis of how each major repeat type contributed to the genome size variation encountered. Differential accumulation of repetitive DNA was found to account for 85% of the genome size differences between the species, and most (57%) of this variation was found to be driven by a single lineage of Ty3/gypsy LTR-retrotransposons, the Ogre elements. Although the amounts of several other lineages of LTR-retrotransposons and the total amount of satellite DNA were also positively correlated with genome size, their contributions to genome size variation were much smaller (up to 6%). Repeat analysis within a phylogenetic framework also revealed profound differences in the extent of sequence conservation between different repeat types across Fabeae. In addition to these findings, the study has provided a proof of concept for the approach combining recent developments in sequencing and bioinformatics to perform comparative analyses of repetitive DNAs in a large number of non-model species without the need to assemble their genomes
A Reference-Free Algorithm for Computational Normalization of Shotgun Sequencing Data
Deep shotgun sequencing and analysis of genomes, transcriptomes, amplified
single-cell genomes, and metagenomes has enabled investigation of a wide range
of organisms and ecosystems. However, sampling variation in short-read data
sets and high sequencing error rates of modern sequencers present many new
computational challenges in data interpretation. These challenges have led to
the development of new classes of mapping tools and {\em de novo} assemblers.
These algorithms are challenged by the continued improvement in sequencing
throughput. We here describe digital normalization, a single-pass computational
algorithm that systematizes coverage in shotgun sequencing data sets, thereby
decreasing sampling variation, discarding redundant data, and removing the
majority of errors. Digital normalization substantially reduces the size of
shotgun data sets and decreases the memory and time requirements for {\em de
novo} sequence assembly, all without significantly impacting content of the
generated contigs. We apply digital normalization to the assembly of microbial
genomic data, amplified single-cell genomic data, and transcriptomic data. Our
implementation is freely available for use and modification
The peculiar landscape of repetitive sequences in the olive (Olea europaea L.) genome
Analyzing genome structure in different species allows to gain an insight into the evolution of plant genome size. Olive (Olea europaea L.) has a medium-sized haploid genome of 1.4 Gb, whose structure is largely uncharacterized, despite the growing importance of this tree as oil crop. Next-generation sequencing technologies and different computational procedures have been used to study the composition of the olive genome and its repetitive fraction. A total of 2.03 and 2.3 genome equivalents of Illumina and 454 reads from genomic DNA, respectively, were assembled following different procedures, which produced more than 200,000 differently redundant contigs, with mean length higher than 1,000 nt. Mapping Illumina reads onto the assembled sequences was used to estimate their redundancy. The genome data set was subdivided into highly and medium redundant and nonredundant contigs. By combining identification and mapping of repeated sequences, it was established that tandem repeats represent a very large portion of the olive genome (∼31% of the whole genome), consisting of six main families of different length, two of which were first discovered in these experiments. The other large redundant class in the olive genome is represented by transposable elements (especially long terminal repeat-retrotransposons). On the whole, the results of our analyses show the peculiar landscape of the olive genome, related to the massive amplification of tandem repeats, more than that reported for any other sequenced plant genome
Stretching the Rules: Monocentric Chromosomes with Multiple Centromere Domains
The centromere is a functional chromosome domain that is essential for faithful chromosome segregation during cell division and that can be reliably identified by the presence of the centromere-specific histone H3 variant CenH3. In monocentric chromosomes, the centromere is characterized by a single CenH3-containing region within a morphologically distinct primary constriction. This region usually spans up to a few Mbp composed mainly of centromere-specific satellite DNA common to all chromosomes of a given species. In holocentric chromosomes, there is no primary constriction; the centromere is composed of many CenH3 loci distributed along the entire length of a chromosome. Using correlative fluorescence light microscopy and high-resolution electron microscopy, we show that pea (Pisum sativum) chromosomes exhibit remarkably long primary constrictions that contain 3-5 explicit CenH3-containing regions, a novelty in centromere organization. In addition, we estimate that the size of the chromosome segment delimited by two outermost domains varies between 69 Mbp and 107 Mbp, several factors larger than any known centromere length. These domains are almost entirely composed of repetitive DNA sequences belonging to 13 distinct families of satellite DNA and one family of centromeric retrotransposons, all of which are unevenly distributed among pea chromosomes. We present the centromeres of Pisum as novel ``meta-polycentric'' functional domains. Our results demonstrate that the organization and DNA composition of functional centromere domains can be far more complex than previously thought, do not require single repetitive elements, and do not require single centromere domains in order to segregate properly. Based on these findings, we propose Pisum as a useful model for investigation of centromere architecture and the still poorly understood role of repetitive DNA in centromere evolution, determination, and function
The Utility of Graph Clustering of 5S Ribosomal DNA Homoeologs in Plant Allopolyploids, Homoploid Hybrids, and Cryptic Introgressants
Introduction: Ribosomal DNA (rDNA) loci have been widely used for identification of
allopolyploids and hybrids, although few of these studies employed high-throughput
sequencing data. Here we use graph clustering implemented in the RepeatExplorer (RE)
pipeline to analyze homoeologous 5S rDNA arrays at the genomic level searching for
hybridogenic origin of species. Data were obtained from more than 80 plant species,
including several well-defined allopolyploids and homoploid hybrids of different
evolutionary ages and from widely dispersed taxonomic groups.
Results: (i) Diploids show simple circular-shaped graphs of their 5S rDNA clusters. In
contrast, most allopolyploids and other interspecific hybrids exhibit more complex graphs
composed of two or more interconnected loops representing intergenic spacers (IGS). (ii)
There was a relationship between graph complexity and locus numbers. (iii) The
sequences and lengths of the 5S rDNA units reconstituted in silico from k-mers were
congruent with those experimentally determined. (iv) Three-genomic comparative cluster
analysis of reads from allopolyploids and progenitor diploids allowed identification of
homoeologous 5S rRNA gene families even in relatively ancient (c. 1 Myr) Gossypium and
Brachypodium allopolyploids which already exhibit uniparental partial loss of rDNA
repeats. (v) Finally, species harboring introgressed genomes exhibit exceptionally
complex graph structures.
Conclusion: We found that the cluster graph shapes and graph parameters (k-mer
coverage scores and connected component index) well-reflect the organization and
intragenomic homogeneity of 5S rDNA repeats. We propose that the analysis of 5S rDNA
cluster graphs computed by the RE pipeline together with the cytogenetic analysis might
be a reliable approach for the determination of the hybrid or allopolyploid plant species
parentage and may also be useful for detecting historical introgression events
Recommended from our members
Reconstructing an ancestral genotype of two hexachlorocyclohexane-degrading Sphingobium species using metagenomic sequence data.
Over the last 60 years, the use of hexachlorocyclohexane (HCH) as a pesticide has resulted in the production of >4 million tons of HCH waste, which has been dumped in open sinks across the globe. Here, the combination of the genomes of two genetic subspecies (Sphingobium japonicum UT26 and Sphingobium indicum B90A; isolated from two discrete geographical locations, Japan and India, respectively) capable of degrading HCH, with metagenomic data from an HCH dumpsite (∼450 mg HCH per g soil), enabled the reconstruction and validation of the last-common ancestor (LCA) genotype. Mapping the LCA genotype (3128 genes) to the subspecies genomes demonstrated that >20% of the genes in each subspecies were absent in the LCA. This includes two enzymes from the 'upper' HCH degradation pathway, suggesting that the ancestor was unable to degrade HCH isomers, but descendants acquired lin genes by transposon-mediated lateral gene transfer. In addition, anthranilate and homogentisate degradation traits were found to be strain (selectively retained only by UT26) and environment (absent in the LCA and subspecies, but prevalent in the metagenome) specific, respectively. One draft secondary chromosome, two near complete plasmids and eight complete lin transposons were assembled from the metagenomic DNA. Collectively, these results reinforce the elastic nature of the genus Sphingobium, and describe the evolutionary acquisition mechanism of a xenobiotic degradation phenotype in response to environmental pollution. This also demonstrates for the first time the use of metagenomic data in ancestral genotype reconstruction, highlighting its potential to provide significant insight into the development of such phenotypes
- …