632 research outputs found

    Analyzing the differences between reads and contigs when performing a taxonomic assignment comparison in metagenomics

    Get PDF
    Metagenomics is an inherently complex field in which one of the primary goals is to determine the compositional organisms present in an environmental sample. Thereby, diverse tools have been developed that are based on the similarity search results obtained from comparing a set of sequences against a database. However, to achieve this goal there still are affairs to solve such as dealing with genomic variants and detecting repeated sequences that could belong to different species in a mixture of uneven and unknown representation of organisms in a sample. Hence, the question of whether analyzing a sample with reads provides further understanding of the metagenome than with contigs arises. The assembly yields larger genomic fragments but bears the risk of producing chimeric contigs. On the other hand, reads are shorter and therefore their statistical significance is harder to asses, but there is a larger number of them. Consequently, we have developed a workflow to assess and compare the quality of each of these alternatives. Synthetic read datasets beloging to previously identified organisms are generated in order to validate the results. Afterwards, we assemble these into a set of contigs and perform a taxonomic analysis on both datasets. The tools we have developed demonstrate that analyzing with reads provide a more trustworthy representation of the species in a sample than contigs especially in cases that present a high genomic variability.Universidad de MĂĄlaga. Campus de Excelencia Internacional AndalucĂ­a Tech

    The impact of sequence database choice on metaproteomic results in gut microbiota studies

    Get PDF
    Background: Elucidating the role of gut microbiota in physiological and pathological processes has recently emerged as a key research aim in life sciences. In this respect, metaproteomics, the study of the whole protein complement of a microbial community, can provide a unique contribution by revealing which functions are actually being expressed by specific microbial taxa. However, its wide application to gut microbiota research has been hindered by challenges in data analysis, especially related to the choice of the proper sequence databases for protein identification. Results: Here, we present a systematic investigation of variables concerning database construction and annotation and evaluate their impact on human and mouse gut metaproteomic results. We found that both publicly available and experimental metagenomic databases lead to the identification of unique peptide assortments, suggesting parallel database searches as a mean to gain more complete information. In particular, the contribution of experimental metagenomic databases was revealed to be mandatory when dealing with mouse samples. Moreover, the use of a "merged" database, containing all metagenomic sequences from the population under study, was found to be generally preferable over the use of sample-matched databases. We also observed that taxonomic and functional results are strongly database-dependent, in particular when analyzing the mouse gut microbiota. As a striking example, the Firmicutes/Bacteroidetes ratio varied up to tenfold depending on the database used. Finally, assembling reads into longer contigs provided significant advantages in terms of functional annotation yields. Conclusions: This study contributes to identify host- and database-specific biases which need to be taken into account in a metaproteomic experiment, providing meaningful insights on how to design gut microbiota studies and to perform metaproteomic data analysis. In particular, the use of multiple databases and annotation tools has to be encouraged, even though this requires appropriate bioinformatic resources

    Metagenomics : tools and insights for analyzing next-generation sequencing data derived from biodiversity studies

    Get PDF
    Advances in next-generation sequencing (NGS) have allowed significant breakthroughs in microbial ecology studies. This has led to the rapid expansion of research in the field and the establishment of “metagenomics”, often defined as the analysis of DNA from microbial communities in environmental samples without prior need for culturing. Many metagenomics statistical/computational tools and databases have been developed in order to allow the exploitation of the huge influx of data. In this review article, we provide an overview of the sequencing technologies and how they are uniquely suited to various types of metagenomic studies. We focus on the currently available bioinformatics techniques, tools, and methodologies for performing each individual step of a typical metagenomic dataset analysis. We also provide future trends in the field with respect to tools and technologies currently under development. Moreover, we discuss data management, distribution, and integration tools that are capable of performing comparative metagenomic analyses of multiple datasets using well-established databases, as well as commonly used annotation standards

    Identification of Candidate Cellulose Utilizing Bacteria from the Rumen of Beef Cattle, Using Bacterial Community Profiling and Metagenomics

    Get PDF
    The ruminal microbiome allows ruminant animals to convert cellulosic biomass into food products. A majority of ruminal microorganisms remain uncharacterized due, in part, to the complexity of ruminal microbial communities. In order to gain further insight, selection-based batch culturing from bovine rumen fluid, in combination with metagenomics, was used to identify and characterize previously uncharacterized rumen bacteria capable of metabolizing cellulose, which was supplemented as a purified substrate. 16S rRNA-based population analysis was used to identify rumen bacteria enriched within 14 days of culturing. As a result of 4 independent experiments, seven different candidate cellulose-utilizing species-level operational taxonomic units (OTUs) were identified. Six of the enriched OTUs showed increased levels ranging between 46 and 445-fold compared to their respective rumen inocula, representing 14.1% to 41.3% of reads in samples supplemented with cellulose. One OTU corresponded to a known species (Ruminococcus flavefaciens), four OTUs were predicted to be uncultured species of known genera (Ethanoligenens sp., two Prevotella sp., and Rummeliibacillus sp.), and two were assigned to the family Ruminococcaceae. One enriched culture consisting of an uncultured Rummeliibacillus and Prevotella was used for metagenome analysis. Analysis revealed genes with predicted cellulolytic capabilities in the Rummeliibacillus-related organism (cellulase, endoglucanase, and beta-glucanase) and in the Prevotella-related organism (cellulase). Additionally, genes predicted to function in cellulose binding, as well as proteases and glutamate synthases needed for amino acid acquisition, were also found in both OTUs. The identification and characterization of novel cellulolytic species of ruminal bacteria will contribute to a better understanding of ruminal cellulose metabolism

    Metagenomic Data Utilization and Analysis (MEDUSA) and Construction of a Global Gut Microbial Gene Catalogue

    Get PDF
    Metagenomic sequencing has contributed important new knowledge about the microbes that live in a symbiotic relationship with humans. With modern sequencing technology it is possible to generate large numbers of sequencing reads from a metagenome but analysis of the data is challenging. Here we present the bioinformatics pipeline MEDUSA that facilitates analysis of metagenomic reads at the gene and taxonomic level. We also constructed a global human gut microbial gene catalogue by combining data from 4 studies spanning 3 continents. Using MEDUSA we mapped 782 gut metagenomes to the global gene catalogue and a catalogue of sequenced microbial species. Hereby we find that all studies share about half a million genes and that on average 300 000 genes are shared by half the studied subjects. The gene richness is higher in the European studies compared to Chinese and American and this is also reflected in the species richness. Even though it is possible to identify common species and a core set of genes, we find that there are large variations in abundance of species and genes

    PhylOTU: a high-throughput procedure quantifies microbial community diversity and resolves novel taxa from metagenomic data.

    Get PDF
    Microbial diversity is typically characterized by clustering ribosomal RNA (SSU-rRNA) sequences into operational taxonomic units (OTUs). Targeted sequencing of environmental SSU-rRNA markers via PCR may fail to detect OTUs due to biases in priming and amplification. Analysis of shotgun sequenced environmental DNA, known as metagenomics, avoids amplification bias but generates fragmentary, non-overlapping sequence reads that cannot be clustered by existing OTU-finding methods. To circumvent these limitations, we developed PhylOTU, a computational workflow that identifies OTUs from metagenomic SSU-rRNA sequence data through the use of phylogenetic principles and probabilistic sequence profiles. Using simulated metagenomic data, we quantified the accuracy with which PhylOTU clusters reads into OTUs. Comparisons of PCR and shotgun sequenced SSU-rRNA markers derived from the global open ocean revealed that while PCR libraries identify more OTUs per sequenced residue, metagenomic libraries recover a greater taxonomic diversity of OTUs. In addition, we discover novel species, genera and families in the metagenomic libraries, including OTUs from phyla missed by analysis of PCR sequences. Taken together, these results suggest that PhylOTU enables characterization of part of the biosphere currently hidden from PCR-based surveys of diversity

    SqueezeMeta, A Highly Portable, Fully Automatic Metagenomic Analysis Pipeline

    Get PDF
    The improvement of sequencing technologies has facilitated generalization of metagenomic sequencing, which has become a standard procedure for analyzing the structure and functionality of microbiomes. Bioinformatic analysis of sequencing results poses a challenge because it involves many different complex steps. SqueezeMeta is a fully automatic pipeline for metagenomics/metatranscriptomics, covering all steps of the analysis. SqueezeMeta includes multi-metagenome support that enables co-assembly of related metagenomes and retrieval of individual genomes via binning procedures. SqueezeMeta features several unique characteristics: co-assembly procedure or co-assembly of unlimited number of metagenomes via merging of individual assembled metagenomes, both with read mapping for estimation of the abundances of genes in each metagenome. It also includes binning and bin checking for retrieving individual genomes. Internal checks for the assembly and binning steps provide information about the consistency of contigs and bins. Moreover, results are stored in a MySQL database, where they can be easily exported and shared, and can be inspected anywhere using a flexible web interface that allows simple creation of complex queries. We illustrate the potential of SqueezeMeta by analyzing 32 gut metagenomes in a fully automatic way, enabling retrieval of several million genes and several hundreds of genomic bins. One of the motivations in the development of SqueezeMeta was producing a software capable of running in small desktop computers and thus amenable to all users and settings. We were also able to co-assemble two of these metagenomes and complete the full analysis in less than one day using a simple laptop computer. This reveals the capacity of SqueezeMeta to run without high-performance computing infrastructure and in absence of any network connectivity. It is therefore adequate for in situ, real time analysis of metagenomes produced by nanopore sequencing. SqueezeMeta can be downloaded from https://github.com/jtamames/SqueezeMeta
    • 

    corecore