19 research outputs found

    Genome-Wide Influence of Indel Substitutions on Evolution of Bacteria of the PVC Superphylum, Revealed Using a Novel Computational Method

    Get PDF
    Whole-genome scans for positive Darwinian selection are widely used to detect evolution of genome novelty. Most approaches are based on evaluation of nonsynonymous to synonymous substitution rate ratio across evolutionary lineages. These methods are sensitive to saturation of synonymous sites and thus cannot be used to study evolution of distantly related organisms. In contrast, indels occur less frequently than amino acid replacements, accumulate more slowly, and can be employed to characterize evolution of diverged organisms. As indels are also subject to the forces of natural selection, they can generate functional changes through positive selection. Here, we present a new computational approach to detect selective constraints on indel substitutions at the whole-genome level for distantly related organisms. Our method is based on ancestral sequence reconstruction, takes into account the varying susceptibility of different types of secondary structure to indels, and according to simulation studies is conservative. We applied this newly developed framework to characterize the evolution of organisms of the Planctomycetes, Verrucomicrobia, Chlamydiae (PVC) bacterial superphylum. The superphylum contains organisms with unique cell biology, physiology, and diverse lifestyles. It includes bacteria with simple cell organization and more complex eukaryote-like compartmentalization. Lifestyles range from free-living organisms to obligate pathogens. In this study, we conduct a whole-genome level analysis of indel substitutions specific to evolutionary lineages of the PVC superphylum and found that indels evolved under positive selection on up to 12% of gene tree branches. We also analyzed possible functional consequences for several case studies of predicted indel events

    Relationship between co-occurrence and new metrics.

    No full text
    <p>Genome content similarity index (A) and (C) and microbe-microbe functional association index (B) and (D) in two different ecological datasets as shown in rows. In each plot, both response and independent variables are adjusted for phylogenetic distance between organisms. Pearson correlations are shown for every plot, “*” and “**” symbols denote associated Mantel p-value < 0.05 and < 0.01 respectively.</p

    Genome composition and phylogeny of microbes predict their co-occurrence in the environment

    No full text
    <div><p>The genomic information of microbes is a major determinant of their phenotypic properties, yet it is largely unknown to what extent ecological associations between different species can be explained by their genome composition. To bridge this gap, this study introduces two new genome-wide pairwise measures of microbe-microbe interaction. The first (genome content similarity index) quantifies similarity in genome composition between two microbes, while the second (microbe-microbe functional association index) summarizes the topology of a protein functional association network built for a given pair of microbes and quantifies the fraction of network edges crossing organismal boundaries. These new indices are then used to predict co-occurrence between reference genomes from two 16S-based ecological datasets, accounting for phylogenetic relatedness of the taxa. Phylogenetic relatedness was found to be a strong predictor of ecological associations between microbes which explains about 10% of variance in co-occurrence data, but genome composition was found to be a strong predictor as well, it explains up to 4% the variance in co-occurrence when all genomic-based indices are used in combination, even after accounting for evolutionary relationships between the species. On their own, the metrics proposed here explain a larger proportion of variance than previously reported more complex methods that rely on metabolic network comparisons. In summary, results of this study indicate that microbial genomes do indeed contain detectable signal of organismal ecology, and the methods described in the paper can be used to improve mechanistic understanding of microbe-microbe interactions.</p></div

    An illustration of how new genomics-based indices are computed.

    No full text
    <p>(A) Genome content similarity index. In case of gene set 1 there are four gene families which are absent or present in both genome A and genome B, resulting in similarity value of 4 for this gene set. In total gene set 1 contains 8 gene families, which means on average 0.5 of them have the same presence/absence state. This way gene set specific similarity per gene was calculated for each gene set, in current illustration there are 7 of them. Further, to produce genome-wide summary scores are averaged across gene set of appropriate size and represented in at least one of the genome (see text for details). (B) Microbe-microbe functional association index. Genomes of two species (A and B) encode genes from 6 and 5 gene families respectively, three gene families are encoded exclusively in genome A (1, 4 and 5), two exclusively in genome B (2 and 3), and three in both genomes (6, 7 and 8). These three categories label the nodes of the protein functional association network. Edges connecting gene families are classified in 6 classes as shown on the figure. Edges connecting gene family encoded in only genome A to gene family encoded in only genome B would have to cross organismal boundary in order to exist within the network of two-species (A and B) community.</p

    Relationship between new metrics and phylogenetic distances between the organisms.

    No full text
    <p>(A) Genome content similarity and (B) microbe-microbe functional association indices with phylogenetic distances between the species for three microbial taxa (shown on the top right) when compared to other core genomes from STRING and microbes related to the query genomes. Distribution of phylogenetic distances (in substitutions per site in 16S rRNA) is shown as histogram on the top, distributions of the indices are shown on the right of the corresponding plots.</p

    Putative pathways exhibiting similarity of gene content in two co-occurring lahnospiracea species.

    No full text
    <p>(A) Pattern of gene presence absence in two interacting species, <i>C</i>. <i>comes</i> and <i>E</i>. <i>rectale</i>, and their related species, <i>E</i>. <i>ventriosum</i> and <i>R</i>. <i>intestinalis</i>, in four identified gene sets of interest. Species name abbreviations are shown in the bottom of the heatmap, gene family annotations from STRING are shown on the right. Gene set IDs are on the left of the heatmap. (B) Patterns of co-occurrence of four species under consideration in human stool samples, ecological dataset 2, are shown as a heatmap. Official names of the organisms are shown on the right. Phylogenetic relationships between the species, as detected using 16S rRNA, are displayed as dendrogram on top and on the left in panes (A) and (B). Species name abbreviations at the bottom and top in panes (A) and (B). Names of co-occurring taxa are shown in bold. Color keys for both panels are on the right.</p

    Regressions analysis of co-occurrence using various genomics-based indices and phylogenetic distance.

    No full text
    <p>Regressions analysis of co-occurrence using various genomics-based indices and phylogenetic distance.</p

    Simulation-Based Evaluation of Hybridization Network Reconstruction Methods in the Presence of Incomplete Lineage Sorting

    No full text
    Hybridization events generate reticulate species relationships, giving rise to species networks rather than species trees. We report a comparative study of consensus, maximum parsimony, and maximum likelihood methods of species network reconstruction using gene trees simulated assuming a known species history. We evaluate the role of the divergence time between species involved in a hybridization event, the relative contributions of the hybridizing species, and the error in gene tree estimation. When gene tree discordance is mostly due to hybridization and not due to incomplete lineage sorting (ILS), most of the methods can detect even highly skewed hybridization events between highly divergent species. For recent divergences between hybridizing species, when the influence of ILS is sufficiently high, likelihood methods outperform parsimony and consensus methods, which erroneously identify extra hybridizations. The more sophisticated likelihood methods, however, are affected by gene tree errors to a greater extent than are consensus and parsimony

    Evaluating allopolyploid origins in strawberries (Fragaria) using haplotypes generated from target capture sequencing

    No full text
    Abstract Background Hybridization is observed in many eukaryotic lineages and can lead to the formation of polyploid species. The study of hybridization and polyploidization faces challenges both in data generation and in accounting for population-level phenomena such as coalescence processes in phylogenetic analysis. Genus Fragaria is one example of a set of plant taxa in which a range of ploidy levels is observed across species, but phylogenetic origins are unknown. Results Here, using 20 diploid and polyploid Fragaria species, we combine approaches from NGS data analysis and phylogenetics to infer evolutionary origins of polyploid strawberries, taking into account coalescence processes. We generate haplotype sequences for 257 low-copy nuclear markers assembled from Illumina target capture sequence data. We then identify putative hybridization events by analyzing gene tree topologies, and further test predicted hybridizations in a coalescence framework. This approach confirms the allopolyploid ancestry of F. chiloensis and F. virginiana, and provides new allopolyploid ancestry hypotheses for F. iturupensis, F. moschata, and F. orientalis. Evidence of gene flow between diploids F. bucharica and F. vesca is also detected, suggesting that it might be appropriate to consider these groups as conspecifics. Conclusions This study is one of the first in which target capture sequencing followed by computational deconvolution of individual haplotypes is used for tracing origins of polyploid taxa. The study also provides new perspectives on the evolutionary history of Fragaria

    Phylogenetic relationship between SecA_DEAD domain proteins from PVC and other genomes.

    No full text
    <p>The clade of the SecA phylogeny containing additional SecA_DEAD domain proteins is shown. Phylogeny was recovered as described in Methods section. Bootstrap support values are shown if higher than 0.5. Domain composition of every protein is shown on the right except for three proteins belonging to the clade on the very top, which are omitted due to large size of the proteins, to maintain visual clarity. The lineage leading to a clade containing planctomycete, verrucomicrobial and proteobacterial sequences is marked by * sign.</p
    corecore