188 research outputs found

    Considerations and complications of mapping small RNA high-throughput data to transposable elements

    Get PDF
    BACKGROUND High-throughput sequencing (HTS) has revolutionized the way in which epigenetic research is conducted. When coupled with fully-sequenced genomes, millions of small RNA (sRNA) reads are mapped to regions of interest and the results scrutinized for clues about epigenetic mechanisms. However, this approach requires careful consideration in regards to experimental design, especially when one investigates repetitive parts of genomes such as transposable elements (TEs), or when such genomes are large, as is often the case in plants. RESULTS Here, in an attempt to shed light on complications of mapping sRNAs to TEs, we focus on the 2,300 Mb maize genome, 85% of which is derived from TEs, and scrutinize methodological strategies that are commonly employed in TE studies. These include choices for the reference dataset, the normalization of multiply mapping sRNAs, and the selection among sRNA metrics. We further examine how these choices influence the relationship between sRNAs and the critical feature of TE age, and contrast their effect on low copy genomic regions and other popular HTS data. CONCLUSIONS Based on our analyses, we share a series of take-home messages that may help with the design, implementation, and interpretation of high-throughput TE epigenetic studies specifically, but our conclusions may also apply to any work that involves analysis of HTS data

    Transcription-related mutations and GC content drive variation in nucleotide substitution rates across the genomes of Arabidopsis thaliana and Arabidopsis lyrata

    Get PDF
    BACKGROUND: There has been remarkably little study of nucleotide substitution rate variation among plant nuclear genes, in part because orthology is difficult to establish. Orthology is even more problematic for intergenic regions of plant nuclear genomes, because plant genomes generally harbor a wealth of repetitive DNA. In theory orthologous intergenic data is valuable for studying rate variation because nucleotide substitutions in these regions should be under little selective constraint compared to coding regions. As a result, evolutionary rates in intergenic regions may more accurately reflect genomic features, like recombination and GC content, that contribute to nucleotide substitution. RESULTS: We generated a set of 66 intergenic sequences in Arabidopsis lyrata, a close relative of Arabidopsis thaliana. The intergenic regions included transposable element (TE) remnants and regions flanking the TEs. We verified orthology of these amplified regions both by comparison of existing A. lyrata – A. thaliana genetic maps and by using molecular features. We compared substitution rates among the 66 intergenic loci, which exhibit ~5-fold rate variation, and compared intergenic rates to a set of 64 orthologous coding sequences. Our chief observations were that the average rate of nucleotide substitution is slower in intergenic regions than in synonymous sites, that rate variation in both intergenic and coding regions correlate with GC content, that GC content alone is not sufficient to explain differences in rates between intergenic and coding regions, and that rates of evolution in intergenic regions correlate negatively with gene density. CONCLUSION: Our observations indicated that mutation rates vary among genomics regions as a function of base composition, suggesting that previous observations of "selective constraint" on non-coding regions could more accurately be attributed to a GC effect instead of selection. The negative correlation between nucleotide substitution rate and gene density provides a potential neutral explanation for a previously documented correlation between gene density and polymorphism levels within A. thaliana. Finally, we discuss potential forces that could contribute to rapid synonymous rates, and provide evidence to suggest that transcription-related mutation contributes to rate differences between intergenic and synonymous sites

    A comparative computational analysis of nonautonomous Helitron elements between maize and rice

    Get PDF
    BackgroundHelitrons are DNA transposable elements that are proposed to replicate via a rolling circle mechanism. Non-autonomous helitron elements have captured gene fragments from many genes in maize (Zea mays ssp. mays) but only a handful of genes in Arabidopsis (Arabidopsis thaliana). This observation suggests very different histories for helitrons in these two species, but it is unclear which species contains helitrons that are more typical of plants.ResultsWe performed computational searches to identify helitrons in maize and rice genomic sequence data. Using 12 previously identified helitrons as a seed set, we identified 23 helitrons in maize, five of which were polymorphic among a sample of inbred lines. Our total sample of maize helitrons contained fragments of 44 captured genes. Twenty-one of 35 of these helitrons did not cluster with other elements into closely related groups, suggesting substantial diversity in the maize element complement. We identified over 552 helitrons in the japonica rice genome. More than 70% of these were found in a collinear location in the indica rice genome, and 508 clustered as a single large subfamily. The japonica rice elements contained fragments of only 11 genes, a number similar to that in Arabidopsis. Given differences in gene capture between maize and rice, we examined sequence properties that could contribute to differences in capture rates, focusing on 3' palindromes that are hypothesized to play a role in transposition termination. The free energy of folding for maize helitrons were significantly lower than those in rice, but the direction of the difference differed from our prediction.ConclusionMaize helitrons are clearly unique relative to those of rice and Arabidopsis in the prevalence of gene capture, but the reasons for this difference remain elusive. Maize helitrons do not seem to be more polymorphic among individuals than those of Arabidopsis; they do not appear to be substantially older or younger than the helitrons in either species; and our analyses provided little evidence that the 3' hairpin plays a role

    A role for palindromic structures in the cis-region of maize Sirevirus LTRs in transposable element evolution and host epigenetic response

    Get PDF
    Transposable elements (TEs) proliferate within the genome of their host, which responds by silencing them epigenetically. Much is known about the mechanisms of silencing in plants, particularly the role of siRNAs in guiding DNA methylation. In contrast, little is known about siRNA targeting patterns along the length of TEs, yet this information may provide crucial insights into the dynamics between hosts and TEs. By focusing on 6456 carefully annotated, full-length Sirevirus LTR retrotransposons in maize, we show that their silencing associates with underlying characteristics of the TE sequence and also uncover three features of the host–TE interaction. First, siRNA mapping varies among families and among elements, but particularly along the length of elements. Within the cis-regulatory portion of the LTRs, a complex palindrome-rich region acts as a hotspot of both siRNA matching and sequence evolution. These patterns are consistent across leaf, tassel, and immature ear libraries, but particularly emphasized for floral tissues and 21- to 22-nt siRNAs. Second, this region has the ability to form hairpins, making it a potential template for the production of miRNA-like, hairpin-derived small RNAs. Third, Sireviruses are targeted by siRNAs as a decreasing function of their age, but the oldest elements remain highly targeted, partially by siRNAs that cross-map to the youngest elements. We show that the targeting of older Sireviruses reflects their conserved palindromes. Altogether, we hypothesize that the palindromes aid the silencing of active elements and influence transposition potential, siRNA targeting levels, and ultimately the fate of an element within the genome

    Experimental parasite infection causes genome-wide changes in DNA methylation

    Get PDF
    Parasites are arguably among the strongest drivers of natural selection, constraining hosts to evolve resistance and tolerance mechanisms. Although, the genetic basis of adaptation to parasite infection has been widely studied, little is known about how epigenetic changes contribute to parasite resistance and eventually, adaptation. Here, we investigated the role of host DNA methylation modifications to respond to parasite infections. In a controlled infection experiment, we used the three-spined stickleback fish, a model species for host-parasite studies, and their nematode parasite Camallanus lacustris. We showed that the levels of DNA methylation are higher in infected fish. Results furthermore suggest correlations between DNA methylation and shifts in key fitness and immune traits between infected and control fish, including respiratory burst and functional trans-generational traits such as the concentration of motile sperm. We revealed that genes associated with metabolic, developmental and regulatory processes (cell death and apoptosis) were differentially methylated between infected and control fish. Interestingly, genes such as the neuropeptide FF receptor 2 and the integrin alpha 1 as well as molecular pathways including the Th1 and Th2 cell differentiation were hypermethylated in infected fish, suggesting parasite-mediated repression mechanisms of immune responses. Altogether, we demonstrate that parasite infection contributes to genome-wide DNA methylation modifications. Our study brings novel insights into the evolution of vertebrate immunity and suggests that epigenetic mechanisms are complementary to genetic responses against parasite-mediated selection

    Evolutionary Genomics of Structural Variation in Asian Rice (Oryza sativa) Domestication

    Get PDF
    Structural variants (SVs) are a largely unstudied feature of plant genome evolution, despite the fact that SVs contribute substantially to phenotypes. In this study, we discovered SVs across a population sample of 347 high-coverage, resequenced genomes of Asian rice (Oryza sativa) and its wild ancestor (O. rufipogon). In addition to this short-read data set, we also inferred SVs from whole-genome assemblies and long-read data. Comparisons among data sets revealed different features of genome variability. For example, genome alignment identified a large (∼4.3 Mb) inversion in indica rice varieties relative to japonica varieties, and long-read analyses suggest that ∼9% of genes from the outgroup (O. longistaminata) are hemizygous. We focused, however, on the resequencing sample to investigate the population genomics of SVs. Clustering analyses with SVs recapitulated the rice cultivar groups that were also inferred from SNPs. However, the site-frequency spectrum of each SV type—which included inversions, duplications, deletions, translocations, and mobile element insertions—was skewed toward lower frequency variants than synonymous SNPs, suggesting that SVs may be predominantly deleterious. Among transposable elements, SINE and mariner insertions were found at especially low frequency. We also used SVs to study domestication by contrasting between rice and O. rufipogon. Cultivated genomes contained ∼25% more derived SVs and mobile element insertions than O. rufipogon, indicating that SVs contribute to the cost of domestication in rice. Peaks of SV divergence were enriched for known domestication genes, but we also detected hundreds of genes gained and lost during domestication, some of which were enriched for traits of agronomic interest.Peer reviewe

    Comparative genomics of the Liberibacter genus reveals widespread diversity in genomic content and positive selection history

    Get PDF
    ‘Candidatus Liberibacter’ is a group of bacterial species that are obligate intracellular plant pathogens and cause Huanglongbing disease of citrus trees and Zebra Chip in potatoes. Here, we examined the extent of intra- and interspecific genetic diversity across the genus using comparative genomics. Our approach examined a wide set of Liberibacter genome sequences including five pathogenic species and one species not known to cause disease. By performing comparative genomics analyses, we sought to understand the evolutionary history of this genus and to identify genes or genome regions that may affect pathogenicity. With a set of 52 genomes, we performed comparative genomics, measured genome rearrangement, and completed statistical tests of positive selection. We explored markers of genetic diversity across the genus, such as average nucleotide identity across the whole genome. These analyses revealed the highest intraspecific diversity amongst the ‘Ca. Liberibacter solanacearum’ species, which also has the largest plant host range. We identified sets of core and accessory genes across the genus and within each species and measured the ratio of nonsynonymous to synonymous mutations (dN/dS) across genes. We identified ten genes with evidence of a history of positive selection in the Liberibacter genus, including genes in the Tad complex, which have been previously implicated as being highly divergent in the ‘Ca. L. capsica’ species based on high values of dN

    Retrogenes in Rice (Oryza sativa L. ssp. japonica) Exhibit Correlated Expression with Their Source Genes

    Get PDF
    Gene duplication occurs by either DNA- or RNA-based processes; the latter duplicates single genes via retroposition of messenger RNA. The expression of a retroposed gene copy (retrocopy) is expected to be uncorrelated with its source gene because upstream promoter regions are usually not part of the retroposition process. In contrast, DNA-based duplication often encompasses both the coding and the intergenic (promoter) regions; hence, expression is often correlated, at least initially, between DNA-based duplicates. In this study, we identified 150 retrocopies in rice (Oryza sativa L. ssp japonica), most of which represent ancient retroposition events. We measured their expression from high-throughput RNA sequencing (RNAseq) data generated from seven tissues. At least 66% of the retrocopies were expressed but at lower levels than their source genes. However, the tissue specificity of retrogenes was similar to their source genes, and expression between retrocopies and source genes was correlated across tissues. The level of correlation was similar between RNA- and DNA-based duplicates, and they decreased over time at statistically indistinguishable rates. We extended these observations to previously identified retrocopies in Arabidopsis thaliana, suggesting they may be general features of the process of retention of plant retrogenes
    • …
    corecore