598 research outputs found

    Species abundance information improves sequence taxonomy classification accuracy.

    Get PDF
    Popular naive Bayes taxonomic classifiers for amplicon sequences assume that all species in the reference database are equally likely to be observed. We demonstrate that classification accuracy degrades linearly with the degree to which that assumption is violated, and in practice it is always violated. By incorporating environment-specific taxonomic abundance information, we demonstrate a significant increase in the species-level classification accuracy across common sample types. At the species level, overall average error rates decline from 25% to 14%, which is favourably comparable to the error rates that existing classifiers achieve at the genus level (16%). Our findings indicate that for most practical purposes, the assumption that reference species are equally likely to be observed is untenable. q2-clawback provides a straightforward alternative for samples from common environments

    Associations among Wine Grape Microbiome, Metabolome, and Fermentation Behavior Suggest Microbial Contribution to Regional Wine Characteristics.

    Get PDF
    UnlabelledRegionally distinct wine characteristics (terroir) are an important aspect of wine production and consumer appreciation. Microbial activity is an integral part of wine production, and grape and wine microbiota present regionally defined patterns associated with vineyard and climatic conditions, but the degree to which these microbial patterns associate with the chemical composition of wine is unclear. Through a longitudinal survey of over 200 commercial wine fermentations, we demonstrate that both grape microbiota and wine metabolite profiles distinguish viticultural area designations and individual vineyards within Napa and Sonoma Counties, California. Associations among wine microbiota and fermentation characteristics suggest new links between microbiota, fermentation performance, and wine properties. The bacterial and fungal consortia of wine fermentations, composed from vineyard and winery sources, correlate with the chemical composition of the finished wines and predict metabolite abundances in finished wines using machine learning models. The use of postharvest microbiota as an early predictor of wine chemical composition is unprecedented and potentially poses a new paradigm for quality control of agricultural products. These findings add further evidence that microbial activity is associated with wine terroirImportanceWine production is a multi-billion-dollar global industry for which microbial control and wine chemical composition are crucial aspects of quality. Terroir is an important feature of consumer appreciation and wine culture, but the many factors that contribute to terroir are nebulous. We show that grape and wine microbiota exhibit regional patterns that correlate with wine chemical composition, suggesting that the grape microbiome may influence terroir In addition to enriching our understanding of how growing region and wine properties interact, this may provide further economic incentive for agricultural and enological practices that maintain regional microbial biodiversity

    mockrobiota: a Public Resource for Microbiome Bioinformatics Benchmarking.

    Get PDF
    Mock communities are an important tool for validating, optimizing, and comparing bioinformatics methods for microbial community analysis. We present mockrobiota, a public resource for sharing, validating, and documenting mock community data resources, available at http://caporaso-lab.github.io/mockrobiota/. The materials contained in mockrobiota include data set and sample metadata, expected composition data (taxonomy or gene annotations or reference sequences for mock community members), and links to raw data (e.g., raw sequence data) for each mock community data set. mockrobiota does not supply physical sample materials directly, but the data set metadata included for each mock community indicate whether physical sample materials are available. At the time of this writing, mockrobiota contains 11 mock community data sets with known species compositions, including bacterial, archaeal, and eukaryotic mock communities, analyzed by high-throughput marker gene sequencing. IMPORTANCE The availability of standard and public mock community data will facilitate ongoing method optimizations, comparisons across studies that share source data, and greater transparency and access and eliminate redundancy. These are also valuable resources for bioinformatics teaching and training. This dynamic resource is intended to expand and evolve to meet the changing needs of the omics community

    Species abundance information improves sequence taxonomy classification accuracy

    Get PDF
    Popular naive Bayes taxonomic classifiers for amplicon sequences assume that all species in the reference database are equally likely to be observed. We demonstrate that classification accuracy degrades linearly with the degree to which that assumption is violated, and in practice it is always violated. By incorporating environment-specific taxonomic abundance information, we demonstrate a significant increase in the species-level classification accuracy across common sample types. At the species level, overall average error rates decline from 25% to 14%, which is favourably comparable to the error rates that existing classifiers achieve at the genus level (16%). Our findings indicate that for most practical purposes, the assumption that reference species are equally likely to be observed is untenable. q2-clawback provides a straightforward alternative for samples from common environments.QIIME 2 development was primarily funded by NSF Awards 1565100 to J.G.C. and 1565057 to R.K. This work was supported by an NHMRC project grant APP1085372, awarded to G.A.H., J.G.C., and R.K

    Beating Naive Bayes at Taxonomic Classification of 16S rRNA Gene Sequences

    Get PDF
    Naive Bayes classifiers (NBC) have dominated the field of taxonomic classification of amplicon sequences for over a decade. Apart from having runtime requirements that allow them to be trained and used on modest laptops, they have persistently provided class-topping classification accuracy. In this work we compare NBC with random forest classifiers, neural network classifiers, and a perfect classifier that can only fail when different species have identical sequences, and find that in some practical scenarios there is little scope for improving on NBC for taxonomic classification of 16S rRNA gene sequences. Further improvements in taxonomy classification are unlikely to come from novel algorithms alone, and will need to leverage other technological innovations, such as ecological frequency information
    • …
    corecore