10 research outputs found

    OGRE: Overlap Graph-based metagenomic Read clustEring

    Get PDF
    The microbes that live in an environment can be identified from the genomic material that is present, also referred to as the metagenome. Using Next Generation Sequencing techniques this genomic material can be obtained from the environment, resulting in a large set of sequencing reads. A proper assembly of these reads into contigs or even full genomes allows one to identify the microbial species and strains that live in the environment. Assembling a metagenome is a challenging task and can benefit from clustering the reads into species-specific bins prior to assembly. In this paper we propose OGRE, an Overlap-Graph based Read clustEring procedure for metagenomic read data. OGRE is the only method that can successfully cluster reads in species-specific bins for large metagenomic datasets without running into computation time- or memory issues

    OGRE: Overlap Graph-based metagenomic Read clustEring

    Get PDF
    MOTIVATION: The microbes that live in an environment can be identified from the combined genomic material, also referred to as the metagenome. Sequencing a metagenome can result in large volumes of sequencing reads. A promising approach to reduce the size of metagenomic datasets is by clustering reads into groups based on their overlaps. Clustering reads are valuable to facilitate downstream analyses, including computationally intensive strain-aware assembly. As current read clustering approaches cannot handle the large datasets arising from high-throughput metagenome sequencing, a novel read clustering approach is needed. In this article, we propose OGRE, an Overlap Graph-based Read clustEring procedure for high-throughput sequencing data, with a focus on shotgun metagenomes. RESULTS: We show that for small datasets OGRE outperforms other read binners in terms of the number of species included in a cluster, also referred to as cluster purity, and the fraction of all reads that is placed in one of the clusters. Furthermore, OGRE is able to process metagenomic datasets that are too large for other read binners into clusters with high cluster purity. CONCLUSION: OGRE is the only method that can successfully cluster reads in species-specific clusters for large metagenomic datasets without running into computation time- or memory issues. AVAILABILITY AND IMPLEMENTATION: Code is made available on Github (https://github.com/Marleen1/OGRE). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

    A metagenomic portrait of the microbial community responsible for two decades of bioremediation of poly-contaminated groundwater

    Get PDF
    Biodegradation of pollutants is a sustainable and cost-effective solution to groundwater pollution. Here, we investigate microbial populations involved in biodegradation of poly-contaminants in a pipeline for heavily contaminated groundwater. Groundwater moves from a polluted park to a treatment plant, where an aerated bioreactor effectively removes the contaminants. While the biomass does not settle in the reactor, sediment is collected afterwards and used to seed the new polluted groundwater via a backwash cycle. The pipeline has successfully operated since 1999, but the biological components in the reactor and the contaminated park groundwater have never been described. We sampled seven points along the pipeline, representing the entire remediation process, and characterized the changing microbial communities using genome-resolved metagenomic analysis. We assembled 297 medium- and high-quality metagenome-assembled genome sequences representing on average 46.3% of the total DNA per sample. We found that the communities cluster into two distinct groups, separating the anaerobic communities in the park groundwater from the aerobic communities inside the plant. In the park, the community is dominated by members of the genus Sulfuricurvum, while the plant is dominated by generalists from the order Burkholderiales. Known aromatic compound biodegradation pathways are four times more abundant in the plant-side communities compared to the park-side. Our findings provide a genome-resolved portrait of the microbial community in a highly effective groundwater treatment system that has treated groundwater with a complex contamination profile for two decades

    OGRE: Overlap Graph-based metagenomic Read clustEring

    No full text
    The microbes that live in an environment can be identified from the genomic material that is present, also referred to as the metagenome. Using Next Generation Sequencing techniques this genomic material can be obtained from the environment, resulting in a large set of sequencing reads. A proper assembly of these reads into contigs or even full genomes allows one to identify the microbial species and strains that live in the environment. Assembling a metagenome is a challenging task and can benefit from clustering the reads into species-specific bins prior to assembly. In this paper we propose OGRE, an Overlap-Graph based Read clustEring procedure for metagenomic read data. OGRE is the only method that can successfully cluster reads in species-specific bins for large metagenomic datasets without running into computation time- or memory issues

    OGRE: Overlap Graph-based metagenomic Read clustEring

    Get PDF
    Motivation: The microbes that live in an environment can be identified from the combined genomic material, also referred to as the metagenome. Sequencing a metagenome can result in large volumes of sequencing reads. A promising approach to reduce the size of metagenomic datasets is by clustering reads into groups based on their overlaps. Clustering reads are valuable to facilitate downstream analyses, including computationally intensive strain-aware assembly. As current read clustering approaches cannot handle the large datasets arising from high-throughput metagenome sequencing, a novel read clustering approach is needed. In this article, we propose OGRE, an Overlap Graph-based Read clustEring procedure for high-throughput sequencing data, with a focus on shotgun metagenomes. Results: We show that for small datasets OGRE outperforms other read binners in terms of the number of species included in a cluster, also referred to as cluster purity, and the fraction of all reads that is placed in one of the clusters. Furthermore, OGRE is able to process metagenomic datasets that are too large for other read binners into clusters with high cluster purity. Conclusion: OGRE is the only method that can successfully cluster reads in species-specific clusters for large metagenomic datasets without running into computation time- or memory issues. Availabilityand implementation: Code is made available on Github (https://github.com/Marleen1/OGRE)

    OGRE:Overlap Graph-based metagenomic Read clustEring

    Get PDF
    Motivation: The microbes that live in an environment can be identified from the combined genomic material, also referred to as the metagenome. Sequencing a metagenome can result in large volumes of sequencing reads. A promising approach to reduce the size of metagenomic datasets is by clustering reads into groups based on their overlaps. Clustering reads are valuable to facilitate downstream analyses, including computationally intensive strain-aware assembly. As current read clustering approaches cannot handle the large datasets arising from high-throughput metagenome sequencing, a novel read clustering approach is needed. In this article, we propose OGRE, an Overlap Graph-based Read clustEring procedure for high-throughput sequencing data, with a focus on shotgun metagenomes. Results: We show that for small datasets OGRE outperforms other read binners in terms of the number of species included in a cluster, also referred to as cluster purity, and the fraction of all reads that is placed in one of the clusters. Furthermore, OGRE is able to process metagenomic datasets that are too large for other read binners into clusters with high cluster purity. Conclusion: OGRE is the only method that can successfully cluster reads in species-specific clusters for large metagenomic datasets without running into computation time- or memory issues. Availabilityand implementation: Code is made available on Github (https://github.com/Marleen1/OGRE). </p

    OGRE

    No full text
    OGRE is a read clustering tool that clusters short reads from a metagenomic dataset using an overlap graph

    OGRE: Overlap Graph-based metagenomic Read clustEring

    No full text
    Motivation: The microbes that live in an environment can be identified from the combined genomic material, also referred to as the metagenome. Sequencing a metagenome can result in large volumes of sequencing reads. A promising approach to reduce the size of metagenomic datasets is by clustering reads into groups based on their overlaps. Clustering reads are valuable to facilitate downstream analyses, including computationally intensive strain-aware assembly. As current read clustering approaches cannot handle the large datasets arising from high-throughput metagenome sequencing, a novel read clustering approach is needed. In this article, we propose OGRE, an Overlap Graph-based Read clustEring procedure for high-throughput sequencing data, with a focus on shotgun metagenomes. Results: We show that for small datasets OGRE outperforms other read binners in terms of the number of species included in a cluster, also referred to as cluster purity, and the fraction of all reads that is placed in one of the clusters. Furthermore, OGRE is able to process metagenomic datasets that are too large for other read binners into clusters with high cluster purity. Conclusion: OGRE is the only method that can successfully cluster reads in species-specific clusters for large metagenomic datasets without running into computation time- or memory issues. Availabilityand implementation: Code is made available on Github (https://github.com/Marleen1/OGRE)

    Selective pressure on microbial communities in a drinking water aquifer – Geochemical parameters vs. micropollutants

    No full text
    Groundwater quality is crucial for drinking water production, but groundwater resources are increasingly threatened by contamination with pesticides. As pesticides often occur at micropollutant concentrations, they are unattractive carbon sources for microorganisms and typically remain recalcitrant. Exploring microbial communities in aquifers used for drinking water production is an essential first step towards understanding the fate of micropollutants in groundwater. In this study, we investigated the interaction between groundwater geochemistry, pesticide presence, and microbial communities in an aquifer used for drinking water production. Two groundwater monitoring wells in The Netherlands were sampled in 2014, 2015, and 2016. In both wells, water was sampled from five discrete depths ranging from 13 to 54 m and was analyzed for geochemical parameters, pesticide concentrations and microbial community composition using 16S rRNA gene sequencing and qPCR. Groundwater geochemistry was stable throughout the study period and pesticides were heterogeneously distributed at low concentrations (μg L−1 range). Microbial community composition was also stable throughout the sampling period. Integration of a unique dataset of chemical and microbial data showed that geochemical parameters and to a lesser extent pesticides exerted selective pressure on microbial communities. Microbial communities in both wells showed similar composition in the deeper aquifer, where pumping results in horizontal flow. This study provides insight into groundwater parameters that shape microbial community composition. This information can contribute to the future implementation of remediation technologies to guarantee safe drinking water production

    A metagenomic portrait of the microbial community responsible for two decades of bioremediation of poly-contaminated groundwater

    No full text
    Biodegradation of pollutants is a sustainable and cost-effective solution to groundwater pollution. Here, we investigate microbial populations involved in biodegradation of poly-contaminants in a pipeline for heavily contaminated groundwater. Groundwater moves from a polluted park to a treatment plant, where an aerated bioreactor effectively removes the contaminants. While the biomass does not settle in the reactor, sediment is collected afterwards and used to seed the new polluted groundwater via a backwash cycle. The pipeline has successfully operated since 1999, but the biological components in the reactor and the contaminated park groundwater have never been described. We sampled seven points along the pipeline, representing the entire remediation process, and characterized the changing microbial communities using genome-resolved metagenomic analysis. We assembled 297 medium- and high-quality metagenome-assembled genome sequences representing on average 46.3% of the total DNA per sample. We found that the communities cluster into two distinct groups, separating the anaerobic communities in the park groundwater from the aerobic communities inside the plant. In the park, the community is dominated by members of the genus Sulfuricurvum, while the plant is dominated by generalists from the order Burkholderiales. Known aromatic compound biodegradation pathways are four times more abundant in the plant-side communities compared to the park-side. Our findings provide a genome-resolved portrait of the microbial community in a highly effective groundwater treatment system that has treated groundwater with a complex contamination profile for two decades
    corecore