39 research outputs found

    Additional file 1: of Taxonomy-aware feature engineering for microbiome classification

    No full text
    Figure S1. The PCoA plot of The Human Microbiome Project Consortium (2012) dataset, which is generated via the beta diversity through plots:py script available by QIIME Figure S2. The PCoA plot provided in the Meta-analysis of environmental microbiomes conducted by Henschel et al. (2015) Figure S3. The PCoA plot of the combined CRC dataset Figure S4. Comparison between the baseline and HFE confusion matrices when applied on CRC1 dataset (Zeller et al., 2014) for Cancer vs. Normal classification Figure S5. Comparison between the baseline and HFE confusion matrices when applied on CRC2 dataset (Zackular et al., 2014) for Cancer vs. Normal classification Figure S6. Comparison between the baseline and HFE confusion matrices when applied on CRC1 + 2 dataset for Cancer vs. Normal classification Figure S7. Comparison between the baseline and HFE confusion matrices when applied on CRC1 + 2 Figure S8. The taxonomic tree of all the informative features extracted by the HFE method for Cancer vs. Normal classification with respect to the dataset provided by Kostic et al. (2012).dataset for Cancer vs. Normal vs. Adenoma classification Figure S9. The taxonomic tree of all the informative features extracted by the HFE method for Cancer vs. Normal classification with respect to CRC1 dataset (Zeller et al., 2014) Figure S10. The taxonomic tree of all the informative features extracted by the HFE method for Cancer vs. Normal classification with respect to CRC2 dataset (Zackular et al., 2014) Figure S11. The taxonomic tree of all the informative features extracted by the HFE method for Cancer vs. Normal classification with respect to CRC1 + 2 dataset Table S1. The cross-validation results of the proposed pipeline when applied for human body site prediction and environment prediction, in terms of AUC. (PDF 27886 kb

    PCoA plot (principal components 1 and 2) for the same samples as in Fig 4.

    No full text
    <p>The scatter plot shows relatively cohesive and distinct ecosystems. While large studies often constitute the bulk of ecosystem clusters, detailed inspection shows support from further, smaller studies. Data points for certain ecosystems have been separated in the subgraphs b) to e). a) PCoA scatter plot including all samples from all environments. The first component largely separates human and and environmental samples, while the second component helps to identify clusters for soil, marine, freshwater and plant-associated samples. Misannotations of insect-associated samples (wrongly annotated as Soil) are shown in the red shape. b) The two main marine clusters, “Marine 1” and “Marine 2” (corresponding to the clusters in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004468#pcbi.1004468.g004" target="_blank">Fig 4</a> with the same name) are identifiable through the composite Ecosystem coloring: Marine sediments, shown in cyan/yellow mostly form “Marine 2” due to their dual membership in soil and marine environments; in contrast “Marine 1” samples are solely colored cyan. Hypersaline samples (red) appear widespread and non-cohesive. c) Fresh water samples, colored by Envo-ID. Several environments (freshwater biome, aquarium, freshwater lake) appear strongly related, while samples from permafrost and sinkholes are outliers. d) Plant samples split according to the two main contributing studies QiimeDB 1792 and 2019 respectively. Each cluster receives further support from small and medium sized studies. e) Soil samples. Composite environments form sub-clusters.</p

    Illustration for Algorithm 1.

    No full text
    <p>Given a hierarchical clustering of samples that are annotated with Ontology terms (colored boxes, ancestry relations are shown with black lines), it detects enriched ontological categories on various levels of abstraction in each possible cluster: while analyzing the indicated cluster (black box, emphasized triangle), all present categories (and their ancestral categories) are characterized by their F-measure. E03 and especially E20 (parent of E02 and E03) are relatively specific for this cluster, as evidenced by a relatively high F-measure, whereas E01, E10 and E02 are mostly present outside the cluster, reflected by a small number of True Positives. Abbreviations: TP = True Positives, FP = False Positives, TN = True Negatives, FN = False Negatives.</p

    Cluster coefficients for homogeneity (cluster compactness) and separation for selected ecosystems and -subsystems (including all samples).

    No full text
    <p>Cluster coefficients for homogeneity (cluster compactness) and separation for selected ecosystems and -subsystems (including all samples).</p

    Comprehensive Meta-analysis of Ontology Annotated 16S rRNA Profiles Identifies Beta Diversity Clusters of Environmental Bacterial Communities

    No full text
    <div><p>Comprehensive mapping of environmental microbiomes in terms of their compositional features remains a great challenge in understanding the microbial biosphere of the Earth. It bears promise to identify the driving forces behind the observed community patterns and whether community assembly happens deterministically. Advances in Next Generation Sequencing allow large community profiling studies, exceeding sequencing data output of conventional methods in scale by orders of magnitude. However, appropriate collection systems are still in a nascent state. We here present a database of 20,427 diverse environmental 16S rRNA profiles from 2,426 independent studies, which forms the foundation of our meta-analysis. We conducted a sample size adaptive all-against-all beta diversity comparison while also respecting phylogenetic relationships of Operational Taxonomic Units(OTUs). After conventional hierarchical clustering we systematically test for enrichment of Environmental Ontology terms and their abstractions in all possible clusters. This post-hoc algorithm provides a novel formalism that quantifies to what extend compositional and semantic similarity of microbial community samples coincide. We automatically visualize significantly enriched subclusters on a comprehensive dendrogram of microbial communities. As a result we obtain the hitherto most differentiated and comprehensive view on global patterns of microbial community diversity. We observe strong clusterability of microbial communities in ecosystems such as human/mammal-associated, geothermal, fresh water, plant-associated, soils and rhizosphere microbiomes, whereas hypersaline and anthropogenic samples are less homogeneous. Moreover, saline samples appear less cohesive in terms of compositional properties than previously reported.</p></div

    Clusters from enriched in Environmental Ontology terms (as determined by).

    No full text
    <p>Clusters from enriched in Environmental Ontology terms (as determined by).</p

    Alpha diversity box plots for different ecosystems.

    No full text
    <p>Based on our dataset, we observe that soil, marine and plant-associated environments in general host more diverse communities. Thanks to the applied sub-categorization, we can further break down ecosystems to inspect diversity in different soil types (shown in supplementary <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004468#pcbi.1004468.s001" target="_blank">S1 Fig</a>). We calculate Phylogenetic Distance Alpha diversity from samples rarefied to 1140 sequences.</p

    Comprehensive clustering of 10,313 samples with at least 2000 sequences.

    No full text
    <p>Clusters enriched in EnvO-terms are identified and color-coded automatically if <i>F</i><sub>1</sub>-score > 0.5. Note that in the dendrogram, the entire clade is colored by the color of the enriched EnvO-term. The human/animal associated and soil clusters are supported by many independent studies, whereas freshwater and geothermal clusters are largely driven by findings of a single study. Study color, ecosystem colors and EnvO associations are visualized in the colorbars below the dendrogram. EnvO-annotation colors are shades of the associated ecosystem color (see legend).</p

    EnvO subgraph for environmental Material.

    No full text
    <p>Node size reflects number of samples assigned to the EnvO-term (logarithmic scale, see size legend, right). Node colors are shades of the overarching ecosystem color, see left legend. Multiple inheritence of EnvO-terms is reflected by several colors arranged in concentric rings.</p

    Comparison of Fisher’s exact test and F-measure.

    No full text
    <p>We perform a grid search result for various significance thresholds for both tests. The the blue mesh shows disagreement of the tests (in %) and the stacked bars in green and red indicate, respectively, to what extend disagreement stems from Fisher’s exact test claiming signifance but not F-measure and vice versa. Under most commonly used thresholds (−<i>log</i><sub>10</sub>(<i>p</i>) score for Fisher’s exact test being 2, 3, or 5) F-measure is a stricter test (completely green bars) as the significant cases are a strict subset of Fisher’s exact test.</p
    corecore