155 research outputs found

    Considerations and complications of mapping small RNA high-throughput data to transposable elements

    Get PDF
    BACKGROUND High-throughput sequencing (HTS) has revolutionized the way in which epigenetic research is conducted. When coupled with fully-sequenced genomes, millions of small RNA (sRNA) reads are mapped to regions of interest and the results scrutinized for clues about epigenetic mechanisms. However, this approach requires careful consideration in regards to experimental design, especially when one investigates repetitive parts of genomes such as transposable elements (TEs), or when such genomes are large, as is often the case in plants. RESULTS Here, in an attempt to shed light on complications of mapping sRNAs to TEs, we focus on the 2,300 Mb maize genome, 85% of which is derived from TEs, and scrutinize methodological strategies that are commonly employed in TE studies. These include choices for the reference dataset, the normalization of multiply mapping sRNAs, and the selection among sRNA metrics. We further examine how these choices influence the relationship between sRNAs and the critical feature of TE age, and contrast their effect on low copy genomic regions and other popular HTS data. CONCLUSIONS Based on our analyses, we share a series of take-home messages that may help with the design, implementation, and interpretation of high-throughput TE epigenetic studies specifically, but our conclusions may also apply to any work that involves analysis of HTS data

    Transcription-related mutations and GC content drive variation in nucleotide substitution rates across the genomes of Arabidopsis thaliana and Arabidopsis lyrata

    Get PDF
    BACKGROUND: There has been remarkably little study of nucleotide substitution rate variation among plant nuclear genes, in part because orthology is difficult to establish. Orthology is even more problematic for intergenic regions of plant nuclear genomes, because plant genomes generally harbor a wealth of repetitive DNA. In theory orthologous intergenic data is valuable for studying rate variation because nucleotide substitutions in these regions should be under little selective constraint compared to coding regions. As a result, evolutionary rates in intergenic regions may more accurately reflect genomic features, like recombination and GC content, that contribute to nucleotide substitution. RESULTS: We generated a set of 66 intergenic sequences in Arabidopsis lyrata, a close relative of Arabidopsis thaliana. The intergenic regions included transposable element (TE) remnants and regions flanking the TEs. We verified orthology of these amplified regions both by comparison of existing A. lyrata – A. thaliana genetic maps and by using molecular features. We compared substitution rates among the 66 intergenic loci, which exhibit ~5-fold rate variation, and compared intergenic rates to a set of 64 orthologous coding sequences. Our chief observations were that the average rate of nucleotide substitution is slower in intergenic regions than in synonymous sites, that rate variation in both intergenic and coding regions correlate with GC content, that GC content alone is not sufficient to explain differences in rates between intergenic and coding regions, and that rates of evolution in intergenic regions correlate negatively with gene density. CONCLUSION: Our observations indicated that mutation rates vary among genomics regions as a function of base composition, suggesting that previous observations of "selective constraint" on non-coding regions could more accurately be attributed to a GC effect instead of selection. The negative correlation between nucleotide substitution rate and gene density provides a potential neutral explanation for a previously documented correlation between gene density and polymorphism levels within A. thaliana. Finally, we discuss potential forces that could contribute to rapid synonymous rates, and provide evidence to suggest that transcription-related mutation contributes to rate differences between intergenic and synonymous sites

    A comparative computational analysis of nonautonomous Helitron elements between maize and rice

    Get PDF
    BackgroundHelitrons are DNA transposable elements that are proposed to replicate via a rolling circle mechanism. Non-autonomous helitron elements have captured gene fragments from many genes in maize (Zea mays ssp. mays) but only a handful of genes in Arabidopsis (Arabidopsis thaliana). This observation suggests very different histories for helitrons in these two species, but it is unclear which species contains helitrons that are more typical of plants.ResultsWe performed computational searches to identify helitrons in maize and rice genomic sequence data. Using 12 previously identified helitrons as a seed set, we identified 23 helitrons in maize, five of which were polymorphic among a sample of inbred lines. Our total sample of maize helitrons contained fragments of 44 captured genes. Twenty-one of 35 of these helitrons did not cluster with other elements into closely related groups, suggesting substantial diversity in the maize element complement. We identified over 552 helitrons in the japonica rice genome. More than 70% of these were found in a collinear location in the indica rice genome, and 508 clustered as a single large subfamily. The japonica rice elements contained fragments of only 11 genes, a number similar to that in Arabidopsis. Given differences in gene capture between maize and rice, we examined sequence properties that could contribute to differences in capture rates, focusing on 3' palindromes that are hypothesized to play a role in transposition termination. The free energy of folding for maize helitrons were significantly lower than those in rice, but the direction of the difference differed from our prediction.ConclusionMaize helitrons are clearly unique relative to those of rice and Arabidopsis in the prevalence of gene capture, but the reasons for this difference remain elusive. Maize helitrons do not seem to be more polymorphic among individuals than those of Arabidopsis; they do not appear to be substantially older or younger than the helitrons in either species; and our analyses provided little evidence that the 3' hairpin plays a role

    A role for palindromic structures in the cis-region of maize Sirevirus LTRs in transposable element evolution and host epigenetic response

    Get PDF
    Transposable elements (TEs) proliferate within the genome of their host, which responds by silencing them epigenetically. Much is known about the mechanisms of silencing in plants, particularly the role of siRNAs in guiding DNA methylation. In contrast, little is known about siRNA targeting patterns along the length of TEs, yet this information may provide crucial insights into the dynamics between hosts and TEs. By focusing on 6456 carefully annotated, full-length Sirevirus LTR retrotransposons in maize, we show that their silencing associates with underlying characteristics of the TE sequence and also uncover three features of the host–TE interaction. First, siRNA mapping varies among families and among elements, but particularly along the length of elements. Within the cis-regulatory portion of the LTRs, a complex palindrome-rich region acts as a hotspot of both siRNA matching and sequence evolution. These patterns are consistent across leaf, tassel, and immature ear libraries, but particularly emphasized for floral tissues and 21- to 22-nt siRNAs. Second, this region has the ability to form hairpins, making it a potential template for the production of miRNA-like, hairpin-derived small RNAs. Third, Sireviruses are targeted by siRNAs as a decreasing function of their age, but the oldest elements remain highly targeted, partially by siRNAs that cross-map to the youngest elements. We show that the targeting of older Sireviruses reflects their conserved palindromes. Altogether, we hypothesize that the palindromes aid the silencing of active elements and influence transposition potential, siRNA targeting levels, and ultimately the fate of an element within the genome

    Experimental parasite infection causes genome-wide changes in DNA methylation

    Get PDF
    Parasites are arguably among the strongest drivers of natural selection, constraining hosts to evolve resistance and tolerance mechanisms. Although, the genetic basis of adaptation to parasite infection has been widely studied, little is known about how epigenetic changes contribute to parasite resistance and eventually, adaptation. Here, we investigated the role of host DNA methylation modifications to respond to parasite infections. In a controlled infection experiment, we used the three-spined stickleback fish, a model species for host-parasite studies, and their nematode parasite Camallanus lacustris. We showed that the levels of DNA methylation are higher in infected fish. Results furthermore suggest correlations between DNA methylation and shifts in key fitness and immune traits between infected and control fish, including respiratory burst and functional trans-generational traits such as the concentration of motile sperm. We revealed that genes associated with metabolic, developmental and regulatory processes (cell death and apoptosis) were differentially methylated between infected and control fish. Interestingly, genes such as the neuropeptide FF receptor 2 and the integrin alpha 1 as well as molecular pathways including the Th1 and Th2 cell differentiation were hypermethylated in infected fish, suggesting parasite-mediated repression mechanisms of immune responses. Altogether, we demonstrate that parasite infection contributes to genome-wide DNA methylation modifications. Our study brings novel insights into the evolution of vertebrate immunity and suggests that epigenetic mechanisms are complementary to genetic responses against parasite-mediated selection

    Evolutionary Genomics of Structural Variation in Asian Rice (Oryza sativa) Domestication

    Get PDF
    Structural variants (SVs) are a largely unstudied feature of plant genome evolution, despite the fact that SVs contribute substantially to phenotypes. In this study, we discovered SVs across a population sample of 347 high-coverage, resequenced genomes of Asian rice (Oryza sativa) and its wild ancestor (O. rufipogon). In addition to this short-read data set, we also inferred SVs from whole-genome assemblies and long-read data. Comparisons among data sets revealed different features of genome variability. For example, genome alignment identified a large (∼4.3 Mb) inversion in indica rice varieties relative to japonica varieties, and long-read analyses suggest that ∼9% of genes from the outgroup (O. longistaminata) are hemizygous. We focused, however, on the resequencing sample to investigate the population genomics of SVs. Clustering analyses with SVs recapitulated the rice cultivar groups that were also inferred from SNPs. However, the site-frequency spectrum of each SV type—which included inversions, duplications, deletions, translocations, and mobile element insertions—was skewed toward lower frequency variants than synonymous SNPs, suggesting that SVs may be predominantly deleterious. Among transposable elements, SINE and mariner insertions were found at especially low frequency. We also used SVs to study domestication by contrasting between rice and O. rufipogon. Cultivated genomes contained ∼25% more derived SVs and mobile element insertions than O. rufipogon, indicating that SVs contribute to the cost of domestication in rice. Peaks of SV divergence were enriched for known domestication genes, but we also detected hundreds of genes gained and lost during domestication, some of which were enriched for traits of agronomic interest.Peer reviewe

    Comparative genomics of the Liberibacter genus reveals widespread diversity in genomic content and positive selection history

    Get PDF
    ‘Candidatus Liberibacter’ is a group of bacterial species that are obligate intracellular plant pathogens and cause Huanglongbing disease of citrus trees and Zebra Chip in potatoes. Here, we examined the extent of intra- and interspecific genetic diversity across the genus using comparative genomics. Our approach examined a wide set of Liberibacter genome sequences including five pathogenic species and one species not known to cause disease. By performing comparative genomics analyses, we sought to understand the evolutionary history of this genus and to identify genes or genome regions that may affect pathogenicity. With a set of 52 genomes, we performed comparative genomics, measured genome rearrangement, and completed statistical tests of positive selection. We explored markers of genetic diversity across the genus, such as average nucleotide identity across the whole genome. These analyses revealed the highest intraspecific diversity amongst the ‘Ca. Liberibacter solanacearum’ species, which also has the largest plant host range. We identified sets of core and accessory genes across the genus and within each species and measured the ratio of nonsynonymous to synonymous mutations (dN/dS) across genes. We identified ten genes with evidence of a history of positive selection in the Liberibacter genus, including genes in the Tad complex, which have been previously implicated as being highly divergent in the ‘Ca. L. capsica’ species based on high values of dN

    Retrogenes in Rice (Oryza sativa L. ssp. japonica) Exhibit Correlated Expression with Their Source Genes

    Get PDF
    Gene duplication occurs by either DNA- or RNA-based processes; the latter duplicates single genes via retroposition of messenger RNA. The expression of a retroposed gene copy (retrocopy) is expected to be uncorrelated with its source gene because upstream promoter regions are usually not part of the retroposition process. In contrast, DNA-based duplication often encompasses both the coding and the intergenic (promoter) regions; hence, expression is often correlated, at least initially, between DNA-based duplicates. In this study, we identified 150 retrocopies in rice (Oryza sativa L. ssp japonica), most of which represent ancient retroposition events. We measured their expression from high-throughput RNA sequencing (RNAseq) data generated from seven tissues. At least 66% of the retrocopies were expressed but at lower levels than their source genes. However, the tissue specificity of retrogenes was similar to their source genes, and expression between retrocopies and source genes was correlated across tissues. The level of correlation was similar between RNA- and DNA-based duplicates, and they decreased over time at statistically indistinguishable rates. We extended these observations to previously identified retrocopies in Arabidopsis thaliana, suggesting they may be general features of the process of retention of plant retrogenes

    Variation in Sphingomonas traits across habitats and phylogenetic clades

    Get PDF
    Whether microbes show habitat preferences is a fundamental question in microbial ecology. If different microbial lineages have distinct traits, those lineages may occur more frequently in habitats where their traits are advantageous. Sphingomonas is an ideal bacterial clade in which to investigate how habitat preference relates to traits because these bacteria inhabit diverse environments and hosts. Here we downloaded 440 publicly available Sphingomonas genomes, assigned them to habitats based on isolation source, and examined their phylogenetic relationships. We sought to address whether: (1) there is a relationship between Sphingomonas habitat and phylogeny, and (2) whether there is a phylogenetic correlation between key, genome-based traits and habitat preference. We hypothesized that Sphingomonas strains from similar habitats would cluster together in phylogenetic clades, and key traits that improve fitness in specific environments should correlate with habitat. Genome-based traits were categorized into the Y-A-S trait-based framework for high growth yield, resource acquisition, and stress tolerance. We selected 252 high quality genomes and constructed a phylogenetic tree with 12 well-defined clades based on an alignment of 404 core genes. Sphingomonas strains from the same habitat clustered together within the same clades, and strains within clades shared similar clusters of accessory genes. Additionally, key genome-based trait frequencies varied across habitats. We conclude that Sphingomonas gene content reflects habitat preference. This knowledge of how environment and host relate to phylogeny may also help with future functional predictions about Sphingomonas and facilitate applications in bioremediation

    Pollination Biology and Adaptive Radiation of Agavaceae, with Special Emphasis on the Genus Agave

    Get PDF
    Agavaceae are an American family that comprises nine genera and ca. 300 species distributed in arid and semiarid environments, mainly in Mexico. The family is very successful and displays a wide array of ecological, reproductive, and morphological adaptations. Many of its members play important roles as keystone species, because they produce abundant resources during the reproductive season. In this paper we analyze the current knowledge about the pollination ecology of the different genera in the family and the role that pollination systems have played in the ecological and phylogenetic success of the group. After providing an overview of each of the genera in the family, we discuss in detail aspects of the reproductive ecology of species in the genus Agave s.l., which is composed of ca. 208 species and includes subgenera of Agave (Agave and Littaea), Manfreda, Polianthes, and Prochnyanthes. Finally, we describe the results of analyses to test the hypothesis that there has been an adaptive radiation in the genus Agave. Using chloroplast and nuclear DNA sequences we estimate the age of the Agavaceae family and the genus Agave to be 12-26 millions of years ago (MYA) and 10 MYA, respectively, and show that mean rates of diversification were higher in the genus Agave than the genus Yucca. The values we report for rates of diversification in Agave s.l. are high when compared to other radiations in plants and animals. We suggest that the desertification of North America, which started ca. 15 MY A was critical in the radiation of agaves and that the generalist pollination system of Agave has been more successful in generating new species than the extreme specialization of Yucca
    corecore