384 research outputs found

    Does a barcoding gap exist in Prokaryotes? Evidences from species delimitation in Cyanobacteria

    Get PDF
    The amount of information that is available on 16S rRNA sequences for prokaryotes thanks to high-throughput sequencing could allow a better understanding of diversity. Nevertheless, the application of predetermined threshold in genetic distances to identify units of diversity (Operative Taxonomic Units, OTUs) may provide biased results. Here we tests for the existence of a barcoding gap in several groups of Cyanobacteria, defining units of diversity according to clear differences between within-species and among-species genetic distances in 16S rRNA. The application of a tool developed for animal DNA taxonomy, the Automatic Barcode Gap Detector (ABGD), revealed that a barcoding gap could actually be found in almost half of the datasets that we tested. The identification of units of diversity through this method provided results that were not compatible with those obtained with the identification of OTUs with threshold of similarity in genetic distances of 97% or 99%. The main message of our results is a call for caution in the estimate of diversity from 16S sequences only, given that different subjective choices in the method to delimit units could provide different results

    Standardisation and optimisation techniques in gut microbiome community analysis

    Get PDF
    With the emergence of high throughput next-generation sequencing the importance of the human gut microbiota as regulators, modulators and maintainers of human health and disease became more and more imminent. Advances in sequencing in the last two decades enabled the analysis of the composition and dynamics of the gut microbiome in unprecedented resolution and complexity. Investigations of this complex community by marker gene studies allowed assertions on presence, absence and ecological dynamics of gut bacteria. Several studies discovered strong relationships between the gut microbiota and human health. Some of these bacteria are shown to be essential for daily life processes like digestion, nutrition uptake, pathogen resistance and immune maturation. Likewise, disturbances of this close relationship, called dysbiosis, have been found to be associated with diseases like diabetes, obesity, colon cancer and inflammatory bowel disease. All this renders the gut microbiome as a highly relevant target of research in medical diagnostics and microbiome community analysis a valid hypothesis building tool. Nevertheless, the vast amount of different methodologies and lack of broadly accepted standards to create and handle gut microbiome abundance data complicates reproducible or replicable findings across studies. Especially in settings, where samples diverge significantly in their total biomass or microbial load, the analysis of the microbiome is hampered. Several efforts to allow accurate inter sample comparisons have been undertaken, including the use of relative abundances or random feature sub-sampling (rarefaction). While these methodologies are the most frequently used, they are not fully capable to correct for these sample-wide differences. To increase comparability between samples the use of exogenous spike-in bacteria is proposed to correct for sample specific differences in microbial load. The methodology is tested on a dilution experiment with known differences between samples and successfully applied on a clinical microbiome data set. These experiments suggest that current analysis methods lack a pivotal angle on the data, that is comparability between samples differing in microbial load. Meanwhile, the proposed spike-in based calibration to microbial load (SCML) allows for accurate estimation of ratios of absolute endogenous bacteria abundances. Furthermore, microbiome community analysis is heavily dependent on the resolution of the underlying read count data. While resolutions such as operational taxonomic units (OTUs) generally overestimate diversity and create highly redundant and sparse datasets, agglomerations to common taxonomy can obfuscate distinct read count patterns of possible sub-populations inside the given taxonomy. Even though the ladder agglomeration strategy might be valid for taxonomy with low phenotypical divergence, plenty taxonomic lineages in fact contain highly diverse sub-species. Thus, a more appropriate taxonomic unit would adapt its resolution for those densely populated branches, allowing for different count resolutions inside the same community. Here the concept of adaptive taxonomic units (ATUs) is introduced and applied on a perturbation experiment including mice receiving antibiotics. For this data set the different classical count resolutions (i.e. collapsed to order, family or genus etc.) produce highly contradictory results. Meanwhile, adaptive taxonomic units (ATUs) derived by hierarchical affinity merging (HAM) adapt the granularity of taxonomy to the underlying sequencing data. Branches of bacterial phylogeny that are highly covered in the data set receive a higher resolution than those that were infrequently observed. The algorithm hereby merges operational taxonomic units (OTUs) guided not only by sequence dissimilarity, but also by count distribution and OTU size. Due to the agglomeration the number of features is reduced significantly, lowering the complexity of the data, while preserving distributional patterns only observable at OTU level. Consequently, the sparsity of the count data is reduced significantly such that every ATU accumulates reasonable count number and can thus be reliably analysed. The algorithm is provided in the form of the R-Package dOTUClust

    Towards complete representation of bacterial contents in metagenomic samples

    Full text link
    Background: In the metagenome assembly of a microbiome community, we may think abundant species would be easier to assemble due to their deeper coverage. However, this conjucture is rarely tested. We often do not know how many abundant species we are missing and do not have an approach to recover these species. Results: Here we proposed k-mer based and 16S RNA based methods to measure the completeness of metagenome assembly. We showed that even with PacBio High-Fidelity (HiFi) reads, abundant species are often not assembled as high strain diversity may lead to fragmented contigs. We developed a novel algorithm to recover abundant metagenome-assembled genomes (MAGs) by identifying circular assembly subgraphs. Our algorithm is reference-free and complement to standard metagenome binning. Evaluated on 14 real datasets, it rescued many abundant species that would be missing with existing methods. Conclusions: Our work stresses the importance of metagenome completeness which is often overlooked before. Our algorithm generates more circular MAGs and moves a step closer to the complete representation of microbiome communities

    Ranking the biases: The choice of OTUs vs. ASVs in 16S rRNA amplicon data analysis has stronger effects on diversity measures than rarefaction and OTU identity threshold

    Get PDF
    Advances in the analysis of amplicon sequence datasets have introduced a methodological shift in how research teams investigate microbial biodiversity, away from sequence identity-based clustering (producing Operational Taxonomic Units, OTUs) to denoising methods (producing amplicon sequence variants, ASVs). While denoising methods have several inherent properties that make them desirable compared to clustering-based methods, questions remain as to the influence that these pipelines have on the ecological patterns being assessed, especially when compared to other methodological choices made when processing data (e.g. rarefaction) and computing diversity indices. We compared the respective influences of two widely used methods, namely DADA2 (a denoising method) vs. Mothur (a clustering method) on 16S rRNA gene amplicon datasets (hypervariable region v4), and compared such effects to the rarefaction of the community table and OTU identity threshold (97% vs. 99%) on the ecological signals detected. We used a dataset comprising freshwater invertebrate (three Unionidae species) gut and environmental (sediment, seston) communities sampled in six rivers in the southeastern USA. We ranked the respective effects of each methodological choice on alpha and beta diversity, and taxonomic composition. The choice of the pipeline significantly influenced alpha and beta diversities and changed the ecological signal detected, especially on presence/absence indices such as the richness index and unweighted Unifrac. Interestingly, the discrepancy between OTU and ASV-based diversity metrics could be attenuated by the use of rarefaction. The identification of major classes and genera also revealed significant discrepancies across pipelines. Compared to the pipeline’s effect, OTU threshold and rarefaction had a minimal impact on all measurements

    Evaluating Established Methods for Rumen 16S rRNA Amplicon Sequencing With Mock Microbial Populations

    Get PDF
    peer-reviewedThe rumen microbiome scientific community has utilized amplicon sequencing as an aid in identifying potential community compositional trends that could be used as an estimation of various production and performance traits including methane emission, animal protein production efficiency, and ruminant health status. In order to translate rumen microbiome studies into executable application, there is a need for experimental and analytical concordance within the community. The objective of this study was to assess these factors in relation to selected currently established methods for 16S phylogenetic community analysis on a microbial community standard (MC) and a DNA standard (DS; ZymoBIOMICSTM). DNA was extracted from MC using the RBBC method commonly used for microbial DNA extraction from rumen digesta samples. 16S rRNA amplicon libraries were generated for the MC and DS using primers routinely used for rumen bacterial and archaeal community analysis. The primers targeted the V4 and V3–V4 region of the 16S rRNA gene and samples were subjected to both 20 and 28 polymerase chain reaction (PCR) cycles under identical cycle conditions. Sequencing was conducted using the Illumina MiSeq platform. As the bacteria contained in the microbial mock community were well-classified species, and for ease of explanation, we used the results of the Basic Local Alignment Search Tool classification to assess the DNA, PCR cycle number, and primer type. Sequence classification methodology was assessed independently. Spearman’s correlation analysis indicated that utilizing the repeated bead beating and column method for DNA extraction in combination with primers targeting the 16S rRNA gene using 20 first-round PCR cycles was sufficient for amplicon sequencing to generate a relatively accurate depiction of the bacterial communities present in rumen samples. These results also emphasize the requirement to develop and utilize positive mock community controls for all rumen microbiomic studies in order to discern errors which may arise at any step during a next-generation sequencing protocol

    Developing the MAR databases – Augmenting Genomic Versatility of Sequenced Marine Microbiota

    Get PDF
    This thesis introduces the MAR databases as marine-specific resources in the genomic landscape. Paper 1 describes the curation effort and development leading to the MAR databases being created. It results in the highly valued reference database MarRef, the broader MarDB, and the marine gene catalog MarCat. Definition of a marine environment, the curation process, and the Marine Metagenomics Portal as a public web-service are described. It facilitates scientists to find marine sequence data for prokaryotes and to explore rich contextual information, secondary metabolites, updated taxonomy, and helps in evaluating genome quality. Many of these database advancements are covered in Paper 2. This includes new entries and development of specific databases on marine fungi (MarFun) and salmon related prokaryotes (SalDB). With the implementation of metagenome assembled and single amplified genomes it leads up to the database quality evaluation discussed in Paper 3. The lack of quality control in primary databases is here discussed based on estimated completeness and contamination in the genomes of the MAR databases. Paper 4 explores the microbiota of skin and gut mucosa of Atlantic salmon. By using a database dependent amplicon analysis, the full-length 16 rRNA gene proved accurate, but not a game-changer in taxonomic classification for this environmental niche. The proportion of dataset sequences lacking clear taxonomic classification suggests lack of diversity in current-day databases and inadequate phylogenetic resolution. Advancing phylogenetic resolution was the subject of Paper 5. Here the highly similar species of genus Aliivibrio became delineated using six genes in a multilocus sequence analysis. Five potentially novel species could in this way be delineated, which coincided with recent genome-wide taxonomy listings. Thus, Paper 4 and 5 parallel those of the MAR databases by providing insight into the inter-relational framework of bioinformatic analysis and marine database sources

    DMSC: A Dynamic Multi-Seeds Method for Clustering 16S rRNA Sequences Into OTUs

    Get PDF
    Next-generation sequencing (NGS)-based 16S rRNA sequencing by jointly using the PCR amplification and NGS technology is a cost-effective technique, which has been successfully used to study the phylogeny and taxonomy of samples from complex microbiomes or environments. Clustering 16S rRNA sequences into operational taxonomic units (OTUs) is often the first step for many downstream analyses. Heuristic clustering is one of the most widely employed approaches for generating OTUs. However, most heuristic OTUs clustering methods just select one single seed sequence to represent each cluster, resulting in their outcomes suffer from either overestimation of OTUs number or sensitivity to sequencing errors. In this paper, we present a novel dynamic multi-seeds clustering method (namely DMSC) to pick OTUs. DMSC first heuristically generates clusters according to the distance threshold. When the size of a cluster reaches the pre-defined minimum size, then DMSC selects the multi-core sequences (MCS) as the seeds that are defined as the n-core sequences (n ≥ 3), in which the distance between any two sequences is less than the distance threshold. A new sequence is assigned to the corresponding cluster depending on the average distance to MCS and the distance standard deviation within the MCS. If a new sequence is added to the cluster, dynamically update the MCS until no sequence is merged into the cluster. The new method DMSC was tested on several simulated and real-life sequence datasets and also compared with the traditional heuristic methods such as CD-HIT, UCLUST, and DBH. Experimental results in terms of the inferred OTUs number, normalized mutual information (NMI) and Matthew correlation coefficient (MCC) metrics demonstrate that DMSC can produce higher quality clusters with low memory usage and reduce OTU overestimation. Additionally, DMSC is also robust to the sequencing errors. The DMSC software can be freely downloaded from https://github.com/NWPU-903PR/DMSC

    Evaluation of 16S rRNA gene sequencing for species and strain-level microbiome analysis.

    Get PDF
    The 16S rRNA gene has been a mainstay of sequence-based bacterial analysis for decades. However, high-throughput sequencing of the full gene has only recently become a realistic prospect. Here, we use in silico and sequence-based experiments to critically re-evaluate the potential of the 16S gene to provide taxonomic resolution at species and strain level. We demonstrate that targeting of 16S variable regions with short-read sequencing platforms cannot achieve the taxonomic resolution afforded by sequencing the entire (~1500 bp) gene. We further demonstrate that full-length sequencing platforms are sufficiently accurate to resolve subtle nucleotide substitutions (but not insertions/deletions) that exist between intragenomic copies of the 16S gene. In consequence, we argue that modern analysis approaches must necessarily account for intragenomic variation between 16S gene copies. In particular, we demonstrate that appropriate treatment of full-length 16S intragenomic copy variants has the potential to provide taxonomic resolution of bacterial communities at species and strain level

    16S rRNA phylogeny and clustering is not a reliable proxy for genome-based taxonomy in Streptomyces

    Get PDF
    Although Streptomyces is one of the most extensively studied genera of bacteria, their taxonomy remains contested and is suspected to contain significant species-level misclassification. Resolving the classification of Streptomyces would benefit many areas of study and applied microbiology that rely heavily on having an accurate ground truth classification of similar and dissimilar organisms, including comparative genomics-based searches for novel antimicrobials in the fight against the ongoing antimicrobial resistance (AMR) crisis. To attempt a resolution, we investigate taxonomic conflicts between 16S rRNA and whole genome classifications using all available 48,981 full-length 16S rRNA Streptomyces sequences from the combined SILVA, Greengenes, Ribosomal Database Project (RDP) and NCBI (National Center for Biotechnology Information) databases, and 2,276 publicly available Streptomyces genome assemblies. We construct a 16S gene tree for 14,239 distinct Streptomyces 16S rRNA sequences, identifying three major lineages of Streptomyces, and find that existing taxonomic classifications are inconsistent with the tree topology. We also use these data to delineate 16S and whole genome landscapes for Streptomyces, finding that 16S and whole-genome classifications of Streptomyces strains are frequently in disagreement, and in particular that 16S zero-radius Operational Taxonomic Units (zOTUs) are often inconsistent with Average Nucleotide Identity (ANI)-based taxonomy. Our results strongly imply that 16S rRNA sequence data does not map to taxonomy sufficiently well to delineate Streptomyces species reliably, and we propose that alternative markers should instead be adopted by the community for classification and metabarcoding. As much of current Streptomyces taxonomy has been determined or supported by historical 16S sequence data and may in parts be in error, we also propose that reclassification of the genus by alternative approaches is required
    • …
    corecore