2,533 research outputs found

    Taxonomy of anaerobic digestion microbiome reveals biases associated with the applied high throughput sequencing strategies

    Get PDF
    In the past few years, many studies investigated the anaerobic digestion microbiome by means of 16S rRNA amplicon sequencing. Results obtained from these studies were compared to each other without taking into consideration the followed procedure for amplicons preparation and data analysis. This negligence was mainly due to the lack of knowledge regarding the biases influencing specific steps of the microbiome investigation process. In the present study, the main technical aspects of the 16S rRNA analysis were checked giving special attention to the approach used for high throughput sequencing. More specifically, the microbial compositions of three laboratory scale biogas reactors were analyzed before and after addition of sodium oleate by sequencing the microbiome with three different approaches: 16S rRNA amplicon sequencing, shotgun DNA and shotgun RNA. This comparative analysis revealed that, in amplicon sequencing, abundance of some taxa (Euryarchaeota and Spirochaetes) was biased by the inefficiency of universal primers to hybridize all the templates. Reliability of the results obtained was also influenced by the number of hypervariable regions under investigation. Finally, amplicon sequencing and shotgun DNA underestimated the Methanoculleus genus, probably due to the low 16S rRNA gene copy number encoded in this taxon

    Species abundance information improves sequence taxonomy classification accuracy.

    Get PDF
    Popular naive Bayes taxonomic classifiers for amplicon sequences assume that all species in the reference database are equally likely to be observed. We demonstrate that classification accuracy degrades linearly with the degree to which that assumption is violated, and in practice it is always violated. By incorporating environment-specific taxonomic abundance information, we demonstrate a significant increase in the species-level classification accuracy across common sample types. At the species level, overall average error rates decline from 25% to 14%, which is favourably comparable to the error rates that existing classifiers achieve at the genus level (16%). Our findings indicate that for most practical purposes, the assumption that reference species are equally likely to be observed is untenable. q2-clawback provides a straightforward alternative for samples from common environments

    Genomic Signal Processing Techniques for Taxonomy Prediction

    Get PDF
    To analyze complex biodiversity in microbial communities, 16S rRNA marker gene sequences are often assigned to operational taxonomic units (OTUs). The abundance of methods that have been used to assign 16S rRNA marker gene sequences into OTUs brings discussions in which one is better. Suggestions on having clustering methods should be stable in which generated OTU assignments do not change as additional sequences are added to the dataset is contradicting some other researches contend that the methods should properly present the distances of sequences is more important. We add one more de novo clustering algorithm, Rolling Snowball to existing ones including the single linkage, complete linkage, average linkage, abundance-based greedy clustering, distance-based greedy clustering, and Swarm and the open and closed-reference methods. We use GreenGenes, RDP, and SILVA 16S rRNA gene databases to show the success of the method. The highest accuracy is obtained with SILVA library

    Reconciliation between operational taxonomic units and species boundaries

    Get PDF
    The development of high-throughput sequencing technologies has revolutionised the field of microbial ecology via 16S rRNA gene amplicon sequencing approaches. Clustering those amplicon sequencing reads into operational taxonomic units (OTUs) using a fixed cut-off is a commonly used approach to estimate microbial diversity. A 97% threshold was chosen with the intended purpose that resulting OTUs could be interpreted as a proxy for bacterial species. Our results show that the robustness of such a generalised cut-off is questionable when applied to short amplicons only covering one or two variable regions of the 16S rRNA gene. It will lead to biases in diversity metrics and makes it hard to compare results obtained with amplicons derived with different primer sets. The method introduced within this work takes into account the differential evolutional rates of taxonomic lineages in order to define a dynamic and taxonomic-dependent OTU clustering cut-off score. For a taxonomic family consisting of species showing high evolutionary conservation in the amplified variable regions, the cut-off will be more stringent than 97%. By taking into consideration the amplified variable regions and the taxonomic family when defining this cut-off, such a threshold will lead to more robust results and closer correspondence between OTUs and species. This approach has been implemented in a publicly available software package called DynamiC

    Clustering 16S rRNA for OTU prediction: A similarity based method

    Get PDF
    To study the phylogeny and taxonomy of samples from complex environments Next-generation sequencing (NGS)-based 16S rRNA sequencing , which has been successfully used  jointly with the PCR amplification and NGS technology. First step for many downstream analyses is clustering 16S rRNA sequences into operational taxonomic units (OTUs). Heuristic clustering is one of the most widely employed approaches for generating OTUs in which one or more seed sequences to represent each cluster are selected. In this work we chose five random seeds for each cluster from a genes library, and  we present a novel distance measure to cluster bacteria in the sample. Artificially created sets of 16S rRNA genes selected from databases are successfully clustered with more than %98 accuracy, sensitivity, and specificity

    Analytical Tools and Databases for Metagenomics in the Next-Generation Sequencing Era

    Get PDF
    Metagenomics has become one of the indispensable tools in microbial ecology for the last few decades, and a new revolution in metagenomic studies is now about to begin, with the help of recent advances of sequencing techniques. The massive data production and substantial cost reduction in next-generation sequencing have led to the rapid growth of metagenomic research both quantitatively and qualitatively. It is evident that metagenomics will be a standard tool for studying the diversity and function of microbes in the near future, as fingerprinting methods did previously. As the speed of data accumulation is accelerating, bioinformatic tools and associated databases for handling those datasets have become more urgent and necessary. To facilitate the bioinformatics analysis of metagenomic data, we review some recent tools and databases that are used widely in this field and give insights into the current challenges and future of metagenomics from a bioinformatics perspective.

    Taxonomic classification of metagenomic sequences

    Get PDF
    Gerlach W. Taxonomic classification of metagenomic sequences. Bielefeld: Universität; 2012.Bacteria, archaea and microeukaryotes can be found in almost every habitat present in nature, in particular in soil, sediments and sea water. They typically live in complex communities with different kinds of symbiotic associations which include relationships with larger organisms like animals or plants. Examples are microbial communities in the gut or on the skin of animals and humans, or bacteria that live in symbiosis with plants. The vast majority of such microbes are unculturable and thus cannot be sequenced by means of traditional methods. The recently upcoming discipline of metagenomics provides various in vivo- and in silico-tools to overcome this limitation. In particular, high-throughput sequencing techniques like 454 or Solexa-Illumina make it possible to explore those microbes by studying whole natural microbial communities and analysing their biological diversity as well as the underlying metabolic pathways. A current limitation of theses technologies is that they can sequence only DNA fragments of a limited length. With this limitation it is usually not possible to recover complete microbial genomes. In addition, the DNA fragments are drawn randomly from the microbial communities and the exact species of origin is unknown. Over the past few years, different methods have been developed for the taxonomic and functional characterization of metagenomic shotgun sequences. However, the taxonomic classification of metagenomic sequences from novel species without close homologues in the biological sequence databases poses a challenge due to the high number of wrong taxonomic predictions on lower taxonomic ranks. In this thesis we present CARMA3, a novel method for the taxonomic classification of assembled and unassembled metagenomic sequences that has been adapted to work with both BLAST and HMMER3 homology searches. CARMA3 accepts protein-encoding DNA sequences, protein sequences, and 16S-rDNA sequences as input. In addition, we present WebCARMA, a web application for the analysis of protein-encoding DNA sequences with CARMA3 without the need for a local installation. We evaluate our novel method in different experiments using simulated and real shotgun metagenomes and show that CARMA3 makes fewer wrong taxonomic predictions (at the same sensitivity) than other BLAST-based methods. In the last experiment we show that also very short reads can, in principle, be used to describe the taxonomic content of a metagenome
    corecore