22 research outputs found

    Efficient computation of Faith's phylogenetic diversity with applications in characterizing microbiomes

    Get PDF
    The number of publicly available microbiome samples is continually growing. As data set size increases, bottlenecks arise in standard analytical pipelines. Faith's phylogenetic diversity (Faith's PD) is a highly utilized phylogenetic alpha diversity metric that has thus far failed to effectively scale to trees with millions of vertices. Stacked Faith's phylogenetic diversity (SFPhD) enables calculation of this widely adopted diversity metric at a much larger scale by implementing a computationally efficient algorithm. The algorithm reduces the amount of computational resources required, resulting in more accessible software with a reduced carbon footprint, as compared to previous approaches. The new algorithm produces identical results to the previous method. We further demonstrate that the phylogenetic aspect of Faith's PD provides increased power in detecting diversity differences between younger and older populations in the FINRISK study's metagenomic data.Peer reviewe

    Multi-level analysis of the gut-brain axis shows autism spectrum disorder-associated molecular and microbial profiles

    Get PDF
    Autism spectrum disorder (ASD) is a neurodevelopmental disorder characterized by heterogeneous cognitive, behavioral and communication impairments. Disruption of the gut-brain axis (GBA) has been implicated in ASD although with limited reproducibility across studies. In this study, we developed a Bayesian differential ranking algorithm to identify ASD-associated molecular and taxa profiles across 10 cross-sectional microbiome datasets and 15 other datasets, including dietary patterns, metabolomics, cytokine profiles and human brain gene expression profiles. We found a functional architecture along the GBA that correlates with heterogeneity of ASD phenotypes, and it is characterized by ASD-associated amino acid, carbohydrate and lipid profiles predominantly encoded by microbial species in the genera Prevotella, Bifidobacterium, Desulfovibrio and Bacteroides and correlates with brain gene expression changes, restrictive dietary patterns and pro-inflammatory cytokine profiles. The functional architecture revealed in age-matched and sex-matched cohorts is not present in sibling-matched cohorts. We also show a strong association between temporal changes in microbiome composition and ASD phenotypes. In summary, we propose a framework to leverage multi-omic datasets from well-defined cohorts and investigate how the GBA influences ASD

    Representing Diet in a Tree-Based Format for Interactive and Exploratory Assessment of Dietary Intake Data

    No full text
    International audienceAbstract Objectives We assessed the utility of representing dietary intake data in hierarchical tree structures that consider relationships among foods. Methods Dietary intake was collected from 1909 adults (≥18 years) using a food frequency questionnaire (FFQ; VioScreen) from the American Gut Project. FFQ food items were formatted into hierarchical tree structures based on 1) USDA's Food Nutrient and Database for Dietary Studies (FNDDS) classifications, 2) nutrient content, and 3) molecular compound information detected via mass spectrometry to capture the non-nutrient composition of foods. Next, we compared how well representing dissimilarities (or distances) between individuals based on their diet corresponded with indices such as the Healthy Eating Index (HEI-2015), when those distances are calculated using tree-based versus non-tree-based metrics. We performed an Adonis test (PERMANOVA) to measure the amount of variation explained (R2) in these diet-based distances by HEI-2015. Results We observed that dietary ordinations generated using tree-based relationships between foods have better agreement with HEI than ordinations generated without considering relatedness between foods. The variation explained by HEI-2015 increased by 35% when using the FNDDS tree compared to using a non-tree based quantitative metric (Bray-Curtis (not tree-based) R2 = 0.02931 vs. Weighted UniFrac (tree-based) R2 = 0.03969), by >20% when using the nutrient tree (vs. Weighted UniFrac R2 = 0.03627), and only marginally (6%) when using the molecular compound tree (vs. Weighted UniFrac R2 = 0.03116). Conclusions We show that tree-based measurements of dietary similarity lead to better agreement with diet indices (e.g., HEI) than when relationships among foods are not considered. We also show that representing dietary intake in a tree-like structure can offer interactive visualizations of data that can be used to inform hypotheses regarding dietary characteristics. Funding Sources Danone Nutricia Research

    Compositionally Aware Phylogenetic Beta-Diversity Measures Better Resolve Microbiomes Associated with Phenotype.

    No full text
    Microbiome data have several specific characteristics (sparsity and compositionality) that introduce challenges in data analysis. The integration of prior information regarding the data structure, such as phylogenetic structure and repeated-measure study designs, into analysis, is an effective approach for revealing robust patterns in microbiome data. Past methods have addressed some but not all of these challenges and features: for example, robust principal-component analysis (RPCA) addresses sparsity and compositionality; compositional tensor factorization (CTF) addresses sparsity, compositionality, and repeated measure study designs; and UniFrac incorporates phylogenetic information. Here we introduce a strategy of incorporating phylogenetic information into RPCA and CTF. The resulting methods, phylo-RPCA, and phylo-CTF, provide substantial improvements over state-of-the-art methods in terms of discriminatory power of underlying clustering ranging from the mode of delivery to adult human lifestyle. We demonstrate quantitatively that the addition of phylogenetic information improves effect size and classification accuracy in both data-driven simulated data and real microbiome data. IMPORTANCE Microbiome data analysis can be difficult because of particular data features, some unavoidable and some due to technical limitations of DNA sequencing instruments. The first step in many analyses that ultimately reveals patterns of similarities and differences among sets of samples (e.g., separating samples from sick and healthy people or samples from seawater versus soil) is calculating the difference between each pair of samples. We introduce two new methods to calculate these differences that combine features of past methods, specifically being able to take into account the principles that most types of microbes are not in most samples (sparsity), that abundances are relative rather than absolute (compositionality), and that all microbes have a shared evolutionary history (phylogeny). We show using simulated and real data that our new methods provide improved classification accuracy of ordinal sample clusters and increased effect size between sample groups on beta-diversity distances
    corecore