1,809 research outputs found

    Multiple Comparative Metagenomics using Multiset k-mer Counting

    Get PDF
    Background. Large scale metagenomic projects aim to extract biodiversity knowledge between different environmental conditions. Current methods for comparing microbial communities face important limitations. Those based on taxonomical or functional assignation rely on a small subset of the sequences that can be associated to known organisms. On the other hand, de novo methods, that compare the whole sets of sequences, either do not scale up on ambitious metagenomic projects or do not provide precise and exhaustive results. Methods. These limitations motivated the development of a new de novo metagenomic comparative method, called Simka. This method computes a large collection of standard ecological distances by replacing species counts by k-mer counts. Simka scales-up today's metagenomic projects thanks to a new parallel k-mer counting strategy on multiple datasets. Results. Experiments on public Human Microbiome Project datasets demonstrate that Simka captures the essential underlying biological structure. Simka was able to compute in a few hours both qualitative and quantitative ecological distances on hundreds of metagenomic samples (690 samples, 32 billions of reads). We also demonstrate that analyzing metagenomes at the k-mer level is highly correlated with extremely precise de novo comparison techniques which rely on all-versus-all sequences alignment strategy or which are based on taxonomic profiling

    Methods for comparative metagenomics

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Metagenomics is a rapidly growing field of research that aims at studying uncultured organisms to understand the true diversity of microbes, their functions, cooperation and evolution, in environments such as soil, water, ancient remains of animals, or the digestive system of animals and humans. The recent development of ultra-high throughput sequencing technologies, which do not require cloning or PCR amplification, and can produce huge numbers of DNA reads at an affordable cost, has boosted the number and scope of metagenomic sequencing projects. Increasingly, there is a need for new ways of comparing multiple metagenomics datasets, and for fast and user-friendly implementations of such approaches.</p> <p>Results</p> <p>This paper introduces a number of new methods for interactively exploring, analyzing and comparing multiple metagenomic datasets, which will be made freely available in a new, comparative version 2.0 of the stand-alone metagenome analysis tool MEGAN.</p> <p>Conclusion</p> <p>There is a great need for powerful and user-friendly tools for comparative analysis of metagenomic data and MEGAN 2.0 will help to fill this gap.</p

    Comparative metagenomics of Daphnia symbionts

    Get PDF
    BACKGROUND: Shotgun sequences of DNA extracts from whole organisms allow a comprehensive assessment of possible symbionts. The current project makes use of four shotgun datasets from three species of the planktonic freshwater crustaceans Daphnia: one dataset from clones of D. pulex and D. pulicaria and two datasets from one clone of D. magna. We analyzed these datasets with three aims: First, we search for bacterial symbionts, which are present in all three species. Second, we search for evidence for Cyanobacteria and plastids, which had been suggested to occur as symbionts in a related Daphnia species. Third, we compare the metacommunities revealed by two different 454 pyrosequencing methods (GS 20 and GS FLX). RESULTS: In all datasets we found evidence for a large number of bacteria belonging to diverse taxa. The vast majority of these were Proteobacteria. Of those, most sequences were assigned to different genera of the Betaproteobacteria family Comamonadaceae. Other taxa represented in all datasets included the genera Flavobacterium, Rhodobacter, Chromobacterium, Methylibium, Bordetella, Burkholderia and Cupriavidus. A few taxa matched sequences only from the D. pulex and the D. pulicaria datasets: Aeromonas, Pseudomonas and Delftia. Taxa with many hits specific to a single dataset were rare. For most of the identified taxa earlier studies reported the finding of related taxa in aquatic environmental samples. We found no clear evidence for the presence of symbiotic Cyanobacteria or plastids. The apparent similarity of the symbiont communities of the three Daphnia species breaks down on a species and strain level. Communities have a similar composition at a higher taxonomic level, but the actual sequences found are divergent. The two Daphnia magna datasets obtained from two different pyrosequencing platforms revealed rather similar results. CONCLUSION: Three clones from three species of the genus Daphnia were found to harbor a rich community of symbionts. These communities are similar at the genus and higher taxonomic level, but are composed of different species. The similarity of these three symbiont communities hints that some of these associations may be stable in the long-term

    CoMet—a web server for comparative functional profiling of metagenomes

    Get PDF
    Analyzing the functional potential of newly sequenced genomes and metagenomes has become a common task in biomedical and biological research. With the advent of high-throughput sequencing technologies comparative metagenomics opens the way to elucidate the genetically determined similarities and differences of complex microbial communities. We developed the web server ‘CoMet’ (http://comet.gobics.de), which provides an easy-to-use comparative metagenomics platform that is well-suitable for the analysis of large collections of metagenomic short read data. CoMet combines the ORF finding and subsequent assignment of protein sequences to Pfam domain families with a comparative statistical analysis. Besides comprehensive tabular data files, the CoMet server also provides visually interpretable output in terms of hierarchical clustering and multi-dimensional scaling plots and thus allows a quick overview of a given set of metagenomic samples

    An application of statistics to comparative metagenomics

    Get PDF
    BACKGROUND: Metagenomics, sequence analyses of genomic DNA isolated directly from the environments, can be used to identify organisms and model community dynamics of a particular ecosystem. Metagenomics also has the potential to identify significantly different metabolic potential in different environments. RESULTS: Here we use a statistical method to compare curated subsystems, to predict the physiology, metabolism, and ecology from metagenomes. This approach can be used to identify those subsystems that are significantly different between metagenome sequences. Subsystems that were overrepresented in the Sargasso Sea and Acid Mine Drainage metagenome when compared to non-redundant databases were identified. CONCLUSION: The methodology described herein applies statistics to the comparisons of metabolic potential in metagenomes. This analysis reveals those subsystems that are more, or less, represented in the different environments that are compared. These differences in metabolic potential lead to several testable hypotheses about physiology and metabolism of microbes from these ecosystems

    Accelerating exhaustive pairwise metagenomic comparisons

    Get PDF
    In this manuscript, we present an optimized and parallel version of our previous work IMSAME, an exhaustive gapped aligner for the pairwise and accurate comparison of metagenomes. Parallelization strategies are applied to take advantage of modern multiprocessor architectures. In addition, sequential optimizations in CPU time and memory consumption are provided. These algorithmic and computational enhancements enable IMSAME to calculate near optimal alignments which are used to directly assess similarity between metagenomes without requiring reference databases. We show that the overall efficiency of the parallel implementation is superior to 80% while retaining scalability as the number of parallel cores used increases. Moreover, we also show thats equential optimizations yield up to 8x speedup for scenarios with larger data.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tec

    Comparative metagenomics of PHA synthase genes in soil

    Get PDF
    Polyhydroxyalkanoates (PHAs) are biopolymers produced naturally by bacteria. They are of considerable scientific interest as fundamental components of bacterial carbon metabolism and have biotechnological applications as potential bioplastics. To date, studies of PHA metabolism have focused on a restricted set of PHA-producing bacterial species. Therefore, the diversity of PHA-producing taxa and gene sequences, and the efficiency of existing primers to recognize PHA marker genes, is unclear. In this thesis, I report the first large-scale metagenomic analysis of PHA producing taxa through taxonomic and functional profiling of 45 soil metagenomes from a broad range of soil types (bulk and rhizosphere). From a total of 229,070 detected class I-III PHA synthase (phaC) genes, PHA-producing microbial communities were inferred and compared between soil environments, and the sequence diversity and primer efficiency for different classes of phaC genes was analyzed. Analysis revealed several main findings: 1) both known and novel PHA-producing taxa were inferred to contribute high proportions of phaC genes in environmental samples; 2) distinct shifts in the PHA-producer communities were observed both between soil types and between phaC classes; 3) phaC-containing species were detected at relatively higher abundance in rhizosphere soils implying a significant role for PHA storage in rhizobacteria; 4) existing primers did not adequately cover the sequence diversity of environmental homologs, and metagenomic diversity can be used to suggest modification that improve primer efficiency

    Comparative Metagenomic Analysis of Two Hot Springs From Ourense (Northwestern Spain) and Others Worldwide

    Get PDF
    [Abstract] With their circumneutral pH and their moderate temperature (66 and 68°C, respectively), As Burgas and Muiño da Veiga are two important human-use hot springs, previously studied with traditional culture methods, but never explored with a metagenomic approach. In the present study, we have performed metagenomic sequence-based analyses to compare the taxonomic composition and functional potential of these hot springs. Proteobacteria, Deinococcus-Thermus, Firmicutes, Nitrospirae, and Aquificae are the dominant phyla in both geothermal springs, but there is a significant difference in the abundance of these phyla between As Burgas and Muiño da Veiga. Phylum Proteobacteria dominates As Burgas ecosystem while Aquificae is the most abundant phylum in Muiño da Veiga. Taxonomic and functional analyses reveal that the variability in water geochemistry might be shaping the differences in the microbial communities inhabiting these geothermal springs. The content in organic compounds of As Burgas water promotes the presence of heterotrophic populations of the genera Acidovorax and Thermus, whereas the sulfate-rich water of Muiño da Veiga favors the co-dominance of genera Sulfurihydrogenibium and Thermodesulfovibrio. Differences in ammonia concentration exert a selective pressure toward the growth of nitrogen-fixing bacteria such as Thermodesulfovibrio in Muiño da Veiga. Temperature and pH are two important factors shaping hot springs microbial communities as was determined by comparative analysis with other thermal springs.This study received financial support from the following organizations: Xunta de Galicia (Consolidación GRC) co-financed by FEDER (Grant Number ED431C 2020/08) and Ministerio de Ciencia, Innovación y Universidades (MICINN) (Grant Number RTI2018-099249-B-I00). The work of M-ED was supported by a FPU fellowship (Ministerio de Educación Cultura y Deporte) FPU12/05050. The metagenome sequencing of As Burgas water was performed by M-ED in the Dinsdale Lab (Department of Biology, San Diego State University), as part of a short stay financed by the Short-Term Mobility program of the FPU scholarshipXunta de Galicia; ED431C 2020/0
    corecore