37 research outputs found

    Power of the γ Test in Detecting HGT

    No full text
    <p>Random SPR operations were applied to each COG tree to assess the power of the γ test. The figures show the power values plotted against the taxon numbers in the COG entries for 1, 2, and 3 SPR changes.</p

    Distribution of Transferred Genes in Different Functional Categories

    No full text
    <p>Functional category abbreviations can be found in <a href="http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.0030316#pbio-0030316-t002" target="_blank">Table 2</a>. The percentage of transferred genes in coenzyme metabolism (H) is significantly high, based on Fisher's exact test.</p

    The W-G Tree Based on the Median Tree Algorithm

    No full text
    <p>A subset of high-quality COG entries, which covers at least seven genomes, was used to build the W-G tree (see <a href="http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.0030316#s4" target="_blank">Materials and Methods</a>). Branches with bootstrap scores less than 50% were collapsed into the polytomous form. Three domains of life are shown as (A) Archaea, (B–J) Bacteria, and (K) Eukaryote. Species are labeled with different colors based on their inferred HGT rates: red, >4%; yellow, 3%–4%; pink, 2%–3%; blue, 1%–2%; green, <1%. Taxonomy labels are (A) Euryarchaea, (B) Proteobacteria, (C) Chlamydiae, (D) Spirochaetes, (E) Thermotogae, (F) Aquificae, (G) Actinobacteria, (H) Deinococcus, (I) Cyanobacteria, (J) Firmicutes, and (K) Fungi.</p

    HGT Inference via Tree Comparison

    No full text
    <p>Raw difference between the SD and the MAST metrics for a given pair of trees tends to increase when HGT is involved in one tree. For example, the raw SD and MAST scores for Gene Tree 1 and the W-G tree are 2 and 2, respectively, while the SD and MAST scores for Gene Tree 2 and the W-G tree are 8 and 2, respectively. This difference between the SD and the MAST scores indicates possible HGT in Gene Tree 2; the (c and d) clade are transferred to the g lineage (dotted arrow). In Gene Tree 1, the (c and d) clade cannot be inferred as transfers because many other factors could have caused the local uncertainty in branching, which should be presented in polytomous form.</p

    The Relationship between Detecting COG Entries with HGT and the <i>p</i>-Values

    No full text
    <p>Dotted curve: the number of COG entries detected to contain HGT at given <i>p</i>-value cutoffs. Straight line: the number of COG entries identified to contain HGT merely by chance, based on given <i>p</i>-value cutoffs. When the cutoff for <i>p</i>-value increases, the number of COG entries that might contain HGT increases, as one would expect. However, the small slope of this curve compared with the line of null hypothesis suggests that the frequency of HGT does not change dramatically, even in a relatively flexible <i>p</i>-value range.</p

    Flowchart of the HGT Inference Procedure

    No full text
    <p>The National Center for Biotechnology Information COG database is preprocessed for a high-quality COG set. This set is used to construct individual gene trees and the W-G tree, using the median tree algorithm. The gene trees are compared against the W-G tree to detect changes in tree topology that are best explained by a branch transfer. The same comparison is done among all gene trees.</p

    Contingency table for probeset <i>g<sub>ij</sub></i> at age <i>a = δ<sub>j</sub></i> where <i>j≤m</i> samples (<i>m = 1, 2, …n-1</i>).

    No full text
    <p>Contingency table for probeset <i>g<sub>ij</sub></i> at age <i>a = δ<sub>j</sub></i> where <i>j≤m</i> samples (<i>m = 1, 2, …n-1</i>).</p

    Functional annotation analysis summary.

    No full text
    <p>Functional annotation analysis summary.</p

    Best number of genes (<i>N</i>) used in age estimation and the difference of median of age is the absolute difference between the median of estimated age and the median of chronological age.

    No full text
    <p>The significance of the error was determined by obtaining 1,000 randomized cross-validation errors with age information randomly shuffled; the significance of the prediction error is the fraction of the 1,000 randomized errors lower than the actual cross-validation error. The best <i>N</i> varies across regions.</p
    corecore