18 research outputs found

    Understanding angiosperm genome interactions and evolution: insights from sacred lotus (Nelumbo nucifera) and the carrot family (Apiaceae)

    Get PDF
    Horizontal and intracellular gene transfers are driving forces in plant evolution. The transfer of DNA into a genome adds genetic diversity and successfully incorporated genes can retain their original function or develop new functions through mutation. While there are trends and hypotheses for the frequency of transfers, age of transfers, and potential mechanisms of transfer each system has its own evolutionary history. The major goal of this study was to investigate gene transfer events and organelle rare genomic changes in two plant systems – Nelumbo (Nelumbonaceae) and the apioid superclade of Apiaceae subfamily Apioideae. Genome sequences from the early diverging angiosperm Nelumbo nucifera ‘China Antique’ were used to describe both intra- and interspecific patterns of variation and investigate intracellular gene transfers (IGT). A percent similarity approach was used to compare DNA from each genome and determine a possible mechanism of DNA transfer, if it occurred. The mechanisms investigated included recombination and double-strand break repair, as evidenced by repeat DNA and the presence of transposable elements. The ‘China Antique’ plastome retains the ancestral gene synteny of Amborella and has no evidence of IGT. ‘China Antique’ has more smaller repeats in its mitochondrial genomes than reported for other angiosperms, but does not contain any large repeats, and its nuclear genome does not have as much organelle DNA as the other angiosperms investigated, including Arabidopsis. The lack of large repeats within the Nelumbo mitochondrial genome may explain the few instances of IGT detected. The few instances of organelle IGTs into its nucleus may be the result of its history of vegetative propagation, low nucleotide substitution rate, and lack of several paleo-duplications. Unlike N. nucifera, and the majority of other angiosperms, the plastomes of several members of the apioid superclade within the carrot family (Apiaceae or Umbelliferae) have instances of IGT into the plastome, in addition to other rare genomic changes (RGCs). To investigate the distribution and mechanism of IGT in species of the apioid superclade and the variable boundary between the two single copy regions and the IR, the complete plastomes of Anethum graveolens, Foeniculum vulgare, Carum carvi, and Coriandrum sativum were sequenced. To determine the distribution of and mechanisms causing these RGCs, the extent of IGT, and changes in gene synteny, the large single copy (LSC)–inverted repeat (IR) boundary in 34 additional species was also sequenced. Analyses of these sequence data suggest that there are several mechanisms at work creating these dynamic IR changes. There is evidence of double-strand break repair in Coriandrum, as well as repeat mediated changes near its IR boundaries. Short dispersed repeats are also implicated as a mechanism of IR change in the 34 additional species investigated. In Carum (tribe Careae) there is an IR boundary expansion, in addition to two small inversions. One of these inversions is near JLA and the other is between psbM and trnT. Anethum and Foeniculum plastomes contain double-strand break repair causing IGT of mtDNA into these plastomes. For the 34 additional species investigated, data support double-strand break repair as a mechanism of plastid evolution and is the likely cause of novel DNA insertions at LSC–IR boundaries. However, without a resolved phylogeny there is no context for how many gene transfer events there were or a timeline for when these events occurred. Molecular phylogenetic studies to date have been unable to produce a well-resolved apioid superclade phylogeny. To resolve relationships among the tribes and other higher-level clades within the group, determine the phylogenetic utility of RGCs, and determine the extent and timing of plastome RGCs in the group, the plastid regions psbM–psbD and psbA–trnH and the nuclear gene PHYA were sequenced. To these sequence data four RGCs were added, as were previously available data from the nrDNA internal transcribed spacer (ITS) region. These molecular data were analyzed separately and in various combinations using maximum likelihood and Bayesian inference methods. While these data were unable to fully resolve higher-level relationships in the apioid superclade, conclusions can be made regarding the distribution and number of RGC events that have occurred in the group. The IR boundary expansion into rps3 occurred only once in the lineage leading to tribes Careae and Pyramidoptereae. In addition, Careae is supported as monophyletic by the presence of the inversion of psbA and trnH. The contraction of the IR to rpl2 and the presence of putative mtDNA adjacent to JLA also likely occurred only once. Alternatively, while not as parsimonious, a maximum of six events is possible if each lineage gained these RGCs independently. Other major lineages within the group are not as strongly delimited and, for these clades RGCs cannot unambiguously support monophyly. Further study of the apioid superclade is necessary to resolve relationships and make further inferences into the evolution of plastomes within the clade

    Fully automated sequence alignment methods are comparable to, and much faster than, traditional methods in large data sets: an example with hepatitis B virus

    Get PDF
    Aligning sequences for phylogenetic analysis (multiple sequence alignment; MSA) is an important, but increasingly computationally expensive step with the recent surge in DNA sequence data. Much of this sequence data is publicly available, but can be extremely fragmentary (i.e., a combination of full genomes and genomic fragments), which can compound the computational issues related to MSA. Traditionally, alignments are produced with automated algorithms and then checked and/or corrected “by eye” prior to phylogenetic inference. However, this manual curation is inefficient at the data scales required of modern phylogenetics and results in alignments that are not reproducible. Recently, methods have been developed for fully automating alignments of large data sets, but it is unclear if these methods produce alignments that result in compatible phylogenies when compared to more traditional alignment approaches that combined automated and manual methods. Here we use approximately 33,000 publicly available sequences from the hepatitis B virus (HBV), a globally distributed and rapidly evolving virus, to compare different alignment approaches. Using one data set comprised exclusively of whole genomes and a second that also included sequence fragments, we compared three MSA methods: (1) a purely automated approach using traditional software, (2) an automated approach including by eye manual editing, and (3) more recent fully automated approaches. To understand how these methods affect phylogenetic results, we compared resulting tree topologies based on these different alignment methods using multiple metrics. We further determined if the monophyly of existing HBV genotypes was supported in phylogenies estimated from each alignment type and under different statistical support thresholds. Traditional and fully automated alignments produced similar HBV phylogenies. Although there was variability between branch support thresholds, allowing lower support thresholds tended to result in more differences among trees. Therefore, differences between the trees could be best explained by phylogenetic uncertainty unrelated to the MSA method used. Nevertheless, automated alignment approaches did not require human intervention and were therefore considerably less time-intensive than traditional approaches. Because of this, we conclude that fully automated algorithms for MSA are fully compatible with older methods even in extremely difficult to align data sets. Additionally, we found that most HBV diagnostic genotypes did not correspond to evolutionarily-sound groups, regardless of alignment type and support threshold. This suggests there may be errors in genotype classification in the database or that HBV genotypes may need a revision

    The influence of a priori grouping on inference of genetic clusters: simulation study and literature review of the DAPC method

    No full text
    Inference of genetic clusters is a key aim of population genetics, sparking development of numerous analytical methods. Within these, there is a conceptual divide between finding de novo structure versus assessment of a priori groups. Recently developed, Discriminant Analysis of Principal Components (DAPC), combines discriminant analysis (DA) with principal component (PC) analysis. When applying DAPC, the groups used in the DA (specified a priori or described de novo) need to be carefully assessed. While DAPC has rapidly become a core technique, the sensitivity of the method to misspecification of groups and how it is being empirically applied, are unknown. To address this, we conducted a simulation study examining the influence of a priori versus de novo group designations, and a literature review of how DAPC is being applied. We found that with a priori groupings, distance between genetic clusters reflected underlying FST. However, when migration rates were high and groups were described de novo there was considerable inaccuracy, both in terms of the number of genetic clusters suggested and placement of individuals into those clusters. Nearly all (90.1%) of 224 studies surveyed used DAPC to find de novo clusters, and for the majority (62.5%) the stated goal matched the results. However, most studies (52.3%) omit key run parameters, preventing repeatability and transparency. Therefore, we present recommendations for standard reporting of parameters used in DAPC analyses. The influence of groupings in genetic clustering is not unique to DAPC, and researchers need to consider their goal and which methods will be most appropriate

    Predicting the spread-risk potential of chronic wasting disease to sympatric ungulate species

    No full text
    Wildlife disease incidence is increasing, resulting in negative impacts on the economy, biodiversity, and potentially human health. Chronic wasting disease (CWD) is a fatal, transmissible spongiform encephalopathy of cervids (wild and captive) which continues to spread geographically resulting in exposure to potential new host species. The disease agent (PrPCWD) is a misfolded conformer of the cellular prion protein (PrPC). In Canada, the disease is endemic in Alberta and Saskatchewan, affecting mule and white-tail deer, with lesser impact on elk and moose. As the disease continues to expand, additional wild ungulate species including bison, bighorn sheep, mountain goat, and pronghorn antelope may be exposed. To better understand the species-barrier, we reviewed the current literature on taxa naturally or experimentally exposed to CWD to identify susceptible and resistant species. We created a phylogeny of these taxa using cytochrome B and found that CWD susceptibility followed the species phylogeny. Using this phylogeny we estimated the probability of CWD susceptibility for wild ungulate species. We then compared PrPC amino acid polymorphisms among these species to identify which sites segregated between susceptible and resistant species. We identified sites that were significantly associated with susceptibility, but they were not fully discriminating. Finally, we sequenced Prnp from 578 wild ungulates to further evaluate their potential susceptibility. Together, these data suggest the host-range for CWD will potentially include pronghorn, mountain goat and bighorn sheep, but bison are likely to be more resistant. These findings highlight the need for monitoring potentially susceptible species as CWD continues to expand

    Genotype_trees.zip

    No full text
    Tree files used for genotype occupancy tests in hepatitis B viruses. Trees estimated from manual or PASTA genome alignments. Files include .tre and .xml format

    Total_alignments.zip

    No full text
    Sequence alignments of the total (genomes + fragmentary sequences) hepatitis B virus data set. Files include the manual alignment and both UPP alignments (manual genome alignment backbone, PASTA genome alignment backbone)

    Genome_alignments.zip

    No full text
    Sequence alignments of hepatitis B virus genomes and the S-region. Files include the manual genome alignment, de-gapped manual alignments, MUSCLE genome alignment, linearized and unlinearized PASTA alignments, and the S-region alignment

    Genome_trees.zip

    No full text
    Tree files estimated from sequence alignments of hepatitis B virus genomes. Trees are best maximum likelihood (ML) trees with bootstrap support values. Includes trees based on MUSCLE, manual, and PASTA genome alignments

    S-region_trees.zip

    No full text
    Trees estimate from S-region alignments of hepatitis B viruses. Includes the best maximum likelihood tree with bootstrap support values, and a file with the bootstrap replicates
    corecore